From brian.goetz at oracle.com Sat Dec 1 05:43:44 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 01 Dec 2012 08:43:44 -0500 Subject: Stream generators In-Reply-To: References: <50B8EB21.7010401@oracle.com> <50B8F208.4000100@oracle.com> Message-ID: <50BA0990.8090100@oracle.com> The issue of creating your own streams via merging is tricky because we're missing the language features (yield, laziness) that tend to make that easy. You can iterate the contents of a Stream with Iterator; you can create a new Stream by creating an Iterator. Writing Iterators by hand is unpleasant. So I wrote a clunky "stream merger" class that was about a page of code to deal with the impedence mismatches; it deals with buffering elements from the input and output iterators, and is driven by a lambda that receives an extra "controller" argument that lets it control the input streams, as well as a peek at the current left and right values. With that, writing your first example looked like: StreamMerger summer = new StreamMerger((cursor, peekLeft, peekRight) -> { if (cursor.rightIsEmpty()) return cursor.advanceLeft(); else if (cursor.leftIsEmpty()) return cursor.advanceRight(); else return cursor.advanceLeft() + cursor.advanceRight(); }); Stream sum = summer.merge(leftStream, rightStream); The stream merger creates an Iterator that consults the lambda about what to do. (Alternately you could write it as a SAM abstract class, and implement the "next" method.) Not quite as pretty as your examples, but serviceable. On 11/30/2012 3:38 PM, Joe Bowbeer wrote: > Here are two examples of methods that combine two streams: > > 1. An addStreams method, like the one below written in Python that > generates a stream by summing two other streams: > > def addStreams(si, sj): # add corres. elements of two streams > if not si: return sj > if not sj: return si > tailsum = lambda ti=si, tj=sj : addStreams(tail(ti), tail(tj)) > return (head(si) + head(sj), tailsum) > > Source: > > http://code.activestate.com/lists/python-list/9029/ > > > 2. A merge method, like the one below written in Scala that generates > Hamming numbers: > > def hamming: Stream[Int] = > Stream.cons(1, merge(hamming.map(2*), merge(hamming.map(3*), > hamming.map(5*)))) > > Where the merge method is defined: > > def merge[T](a: Stream[T], b: Stream[T])(implicit ord: Ordering[T]): > Stream[T] = { > if (b.isEmpty) > a > else if (a.isEmpty) > b > else { > val order = ord.compare(a.head, b.head) > if (order < 0) > Stream.cons(a.head, merge(a.tail, b)) > else if (order > 0) > Stream.cons(b.head, merge(a, b.tail)) > else > Stream.cons(a.head, merge(a.tail, b.tail)) > } > } > > Source: > > http://code.google.com/p/wing-ding/source/browse/trunk/books/Programming_Scala/src/adhoc/HammingNumbers/src/hamming/Main.scala > > > Is it easy to write the corresponding methods in Java? > > Joe > > > > On Fri, Nov 30, 2012 at 10:09 AM, Joe Bowbeer > wrote: > > I mean a more general merge which may emit an element from either > stream, depending, and may drop some elements from one or both streams. > > On Nov 30, 2012 9:51 AM, "Brian Goetz" > wrote: > > I think it would be beneficial for comparison to show a bit > of their > implementations. > > > Here's iterate(seed, UnaryOperator): > > public static Stream iterate(final T seed, final > UnaryOperator f) { > Objects.requireNonNull(f); > final InfiniteIterator iterator = new > InfiniteIterator() { > T t = null; > > @Override > public T next() { > return t = (t == null) ? seed : f.operate(t); > } > }; > return stream(new > StreamSource.ForIterator<>(__iterator), StreamOpFlag.IS_ORDERED); > } > > Not too difficult. But, the idea is to make things that are > easy in the header of a for-loop to be easy as the source of a > stream. > > repeat(n) in Scheme is about 10 characters. > > > Yeah, well this is Java... > > How difficult is it to implement a merge, as might be needed > to generate > Hamming numbers? (One of my favorite test cases.) > > > You mean, interleave two streams? That's on our list to > implement as Streams.interleave(a, b). > > Is there a method to limit a stream to a length? If so then > one of your > methods may be extra baggage. > > > Yes: stream.limit(n). > > From forax at univ-mlv.fr Sat Dec 1 05:48:27 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 01 Dec 2012 14:48:27 +0100 Subject: Stream generators In-Reply-To: <50BA0990.8090100@oracle.com> References: <50B8EB21.7010401@oracle.com> <50B8F208.4000100@oracle.com> <50BA0990.8090100@oracle.com> Message-ID: <50BA0AAB.5020800@univ-mlv.fr> On 12/01/2012 02:43 PM, Brian Goetz wrote: > The issue of creating your own streams via merging is tricky because > we're missing the language features (yield, laziness) that tend to > make that easy. You can iterate the contents of a Stream with > Iterator; you can create a new Stream by creating an Iterator. Writing > Iterators by hand is unpleasant. yield is just at the gate. There is a patch in the mlvm repo waiting for some love. R?mi From kevinb at google.com Sat Dec 1 09:34:28 2012 From: kevinb at google.com (Kevin Bourrillion) Date: Sat, 1 Dec 2012 12:34:28 -0500 Subject: Stream.flatMap signature is not correct In-Reply-To: <50B96108.8090901@univ-mlv.fr> References: <50B96108.8090901@univ-mlv.fr> Message-ID: On Fri, Nov 30, 2012 at 8:44 PM, Remi Forax wrote: Stream.flatMap is currently specified as: > > Stream flatMap(FlatMapper mapper); > > but should be: > Stream flatMap(FlatMapper mapper); > I don't understand. If the mapper produces Numbers how can you return a stream of Integers? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121201/bbda691e/attachment.html From forax at univ-mlv.fr Sat Dec 1 09:55:19 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 01 Dec 2012 18:55:19 +0100 Subject: Stream.flatMap signature is not correct In-Reply-To: References: <50B96108.8090901@univ-mlv.fr> Message-ID: <50BA4487.1060903@univ-mlv.fr> On 12/01/2012 06:34 PM, Kevin Bourrillion wrote: > On Fri, Nov 30, 2012 at 8:44 PM, Remi Forax > wrote: > > Stream.flatMap is currently specified as: > > Stream flatMap(FlatMapper mapper); > > but should be: > Stream flatMap(FlatMapper mapper); > > > I don't understand. If the mapper produces Numbers how can you return > a stream of Integers? sorry, never send a mail Friday evening. R?mi From brian.goetz at oracle.com Sat Dec 1 11:10:53 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 01 Dec 2012 14:10:53 -0500 Subject: Bikeshed opportunity: filter/map/reduce naming In-Reply-To: <50B95553.2000806@cs.oswego.edu> References: <50B940F4.50600@oracle.com> <50B95553.2000806@cs.oswego.edu> Message-ID: <50BA563D.5090404@oracle.com> >> But, people have complained about filter because they can't tell >> whether we are >> filtering OUT the elements matching the predicate, or including them. >> Some of >> these people have suggested "where(Predicate)" as an alternative. >> Which seems >> OK to me. > > "select" is the most classic name. Especially for a database company :-) > Also "selectAny" etc. Given the database imperative you cite (there's also LinQ), if we were to use select() at all, it would be for what is now called map? xs.where(x -> x.getFoo() > 3) .select(x -> x.getBar()) ... If/when we get tuples that might actually look nice: xs.where(x -> x.getFoo() > 3) .select(x -> #( x.getBar(), x.getBaz() )) // stream of (bar,baz) ... But probably should stay away from select entirely for these reasons. Still a slight preference for "where" over "filter": list.where(e -> e.hasChildren()) .map(e -> e.getFirstChild()) .minBy(Child::getAge) Other ideas? From forax at univ-mlv.fr Sat Dec 1 15:20:18 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 02 Dec 2012 00:20:18 +0100 Subject: Stream.flatMap signature is not correct In-Reply-To: <50BA4487.1060903@univ-mlv.fr> References: <50B96108.8090901@univ-mlv.fr> <50BA4487.1060903@univ-mlv.fr> Message-ID: <50BA90B2.5040603@univ-mlv.fr> On 12/01/2012 06:55 PM, Remi Forax wrote: > On 12/01/2012 06:34 PM, Kevin Bourrillion wrote: >> On Fri, Nov 30, 2012 at 8:44 PM, Remi Forax > > wrote: >> >> Stream.flatMap is currently specified as: >> >> Stream flatMap(FlatMapper mapper); >> >> but should be: >> Stream flatMap(FlatMapper mapper); >> >> >> I don't understand. If the mapper produces Numbers how can you >> return a stream of Integers? > > sorry, never send a mail Friday evening. > > R?mi > BTW, the problem of inference is still present: Arrays.asList(Object.class, String.class).stream().flatMap( (block, clazz) -> { Arrays.asStream(clazz.getFields()).forEach(block); }); // doesn'tcompile Arrays.asList(Object.class, String.class).stream().flatMap( (Block block, Class clazz) -> { Arrays.asStream(clazz.getFields()).forEach(block); }); // Ok Also note that it doesn't work with the signature (Block, Class) because Block is a subtype of Block. R?mi From forax at univ-mlv.fr Sat Dec 1 16:01:23 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 02 Dec 2012 01:01:23 +0100 Subject: java.util.stream Message-ID: <50BA9A53.9060705@univ-mlv.fr> The package stream contains two different views, the user view with Stream/BaseStream, Streamable and Streams and the providers view with the other classes. StreamOpFlags is in the middle and should be separated in two parts, the int flags (IS_ and NOT_) are the user view and the enum values are the spi part. I think the user view shouble be moved in java.util (or another package) and the spi view shouble stay in java.stream. cheers, R?mi From forax at univ-mlv.fr Sat Dec 1 16:02:28 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 02 Dec 2012 01:02:28 +0100 Subject: Stream and Iterator Message-ID: <50BA9A94.6070207@univ-mlv.fr> Why Stream is not Iterable now that it has a method iterator() ? R?mi From forax at univ-mlv.fr Sat Dec 1 16:02:39 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 02 Dec 2012 01:02:39 +0100 Subject: BaseStream & Stream Message-ID: <50BA9A9F.7050200@univ-mlv.fr> Brian, I think BaseStream (as supertype of Stream, IntStream, LongStream, etc) is not the best design. I see the fact that an IntStream should be view as a BaseStream as an interopt issue, I think it's better if IntStream has a method asStream that return a Stream, i.e. a real Stream not a strawman stream as BaseStream currently is. This design is in my opinion better because from the user point of view, there is only one interface, Stream. Note that java.nio.Buffer has a similar design. cheers, R?mi From brian.goetz at oracle.com Sat Dec 1 16:19:01 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 01 Dec 2012 19:19:01 -0500 Subject: Stream and Iterator In-Reply-To: <50BA9A94.6070207@univ-mlv.fr> References: <50BA9A94.6070207@univ-mlv.fr> Message-ID: <50BA9E75.7010802@oracle.com> Because the common understanding of "Iterable" is that you can repeatedly ask for an Iterator. Yes, I know that the spec doesn't say that, but that's how it's understood. Asking for an Iterator is not the common way of accessing a stream's elements; it's an escape hatch. Making it Iterable would only call attention to this and confuse people. On 12/1/2012 7:02 PM, Remi Forax wrote: > Why Stream is not Iterable now that it has a method iterator() ? > > R?mi > From brian.goetz at oracle.com Sat Dec 1 16:22:55 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 01 Dec 2012 19:22:55 -0500 Subject: BaseStream & Stream In-Reply-To: <50BA9A9F.7050200@univ-mlv.fr> References: <50BA9A9F.7050200@univ-mlv.fr> Message-ID: <50BA9F5F.7020204@oracle.com> BaseStream currently serves the type system more than it serves the user, which is a defect. The user would never deal in a BaseStream; the only place it shows up is in type constraints like Foo. Ideally we could get rid of it entirely. Now that the design is stabilizing, this is a good time to explore that. On 12/1/2012 7:02 PM, Remi Forax wrote: > Brian, > I think BaseStream (as supertype of Stream, IntStream, LongStream, etc) > is not the best design. > I see the fact that an IntStream should be view as a BaseStream as an > interopt issue, > I think it's better if IntStream has a method asStream that return a > Stream, > i.e. a real Stream not a strawman stream as BaseStream currently is. > > This design is in my opinion better because from the user point of view, > there is only one interface, Stream. > Note that java.nio.Buffer has a similar design. > > cheers, > R?mi > > > > From forax at univ-mlv.fr Sun Dec 2 02:46:31 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 02 Dec 2012 11:46:31 +0100 Subject: Stream and Iterator In-Reply-To: <50BA9E75.7010802@oracle.com> References: <50BA9A94.6070207@univ-mlv.fr> <50BA9E75.7010802@oracle.com> Message-ID: <50BB3187.8070404@univ-mlv.fr> On 12/02/2012 01:19 AM, Brian Goetz wrote: > Because the common understanding of "Iterable" is that you can > repeatedly ask for an Iterator. Yes, I know that the spec doesn't say > that, but that's how it's understood. Asking for an Iterator is not > the common way of accessing a stream's elements; it's an escape hatch. agree. > Making it Iterable would only call attention to this and confuse people. so iterator() and spliterator() should nt be part of the interface too but accessible through a method call, something like stream.escapeHatch().iterator(), and the interface EscapeHatch can be Iterable. Note that from the implementation POV espaceHatch() can just return this. R?mi > > > > On 12/1/2012 7:02 PM, Remi Forax wrote: >> Why Stream is not Iterable now that it has a method iterator() ? >> >> R?mi >> From forax at univ-mlv.fr Sun Dec 2 02:48:03 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 02 Dec 2012 11:48:03 +0100 Subject: BaseStream & Stream In-Reply-To: <50BA9F5F.7020204@oracle.com> References: <50BA9A9F.7050200@univ-mlv.fr> <50BA9F5F.7020204@oracle.com> Message-ID: <50BB31E3.4050503@univ-mlv.fr> On 12/02/2012 01:22 AM, Brian Goetz wrote: > BaseStream currently serves the type system more than it serves the > user, which is a defect. The user would never deal in a BaseStream; > the only place it shows up is in type constraints like Foo BaseStream>. Ideally we could get rid of it entirely. Now that the > design is stabilizing, this is a good time to explore that. maybe, it can be declared non public if Foo is in the same package. R?mi > > On 12/1/2012 7:02 PM, Remi Forax wrote: >> Brian, >> I think BaseStream (as supertype of Stream, IntStream, LongStream, etc) >> is not the best design. >> I see the fact that an IntStream should be view as a BaseStream as an >> interopt issue, >> I think it's better if IntStream has a method asStream that return a >> Stream, >> i.e. a real Stream not a strawman stream as BaseStream currently is. >> >> This design is in my opinion better because from the user point of view, >> there is only one interface, Stream. >> Note that java.nio.Buffer has a similar design. >> >> cheers, >> R?mi >> >> >> >> From Donald.Raab at gs.com Sun Dec 2 17:30:24 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Sun, 2 Dec 2012 20:30:24 -0500 Subject: Bikeshed opportunity: filter/map/reduce naming In-Reply-To: <50B95553.2000806@cs.oswego.edu> References: <50B940F4.50600@oracle.com> <50B95553.2000806@cs.oswego.edu> Message-ID: <6712820CB52CFB4D842561213A77C05404BE202F57@GSCMAMP09EX.firmwide.corp.gs.com> Let it be said that Doug went to the Green Eggs and Ham before I did. IIRC GE&H was written with less than 50 words, so Dr. Seuss must have spent a lot of time getting his words just right. I think it is important to have a consistent "set" of names, with a single answer to "Where did that name come from?" GS Collections borrowed its initial set of collection names from Smalltalk (select, reject, collect, detect, injectInto). It should be trivial for Smalltalk, Ruby and Groovy developers to understand the protocol. Not everyone likes the names, but it saves potentially hundreds of hours of debate on individual bikeshed opportunities. ;-) If you choose select, you should probably choose collect. Reduce is different. You can achieve reduce with injectInto, but injectInto is more closely related to foldl I believe. So as Doug says, reduce should just be reduce. What would you like the Java names to be similar to Brian? Filter/Map/Reduce seems to be most similar to Python, Scala / Haskell / Lisp. I'm fine with any of these naming conventions. I do feel transform would be a better name for map in Java. Map is an overloaded term in Java. On the low percentage chance you go with select/collect, developers using GS Collections + Java Streams are going to either be happy or miserable. I can't tell which for sure. > > (No, Don, we're not going with the Dr. Seuss names. :) > > > > But, people have complained about filter because they can't tell > > whether we are filtering OUT the elements matching the predicate, or > > including them. Some of these people have suggested > > "where(Predicate)" as an alternative. Which seems OK to me. > > > > "select" is the most classic name. Especially for a database company :- > ) Also "selectAny" etc. > > > > Others find "map" too math-y. (The alternatives I can think of are > > also math-y; project, transform, apply). > > Do they think that "java.util.Set" is too mathy? > > > > > Further, "reduce" and "fold" are unfamiliar to many Java developers. > > The .NET folks went with "aggregate" to describe their > reduction/folding operations. > > > > Stick with "reduce". Nothing else means the same thing with such > clarity. > "aggregate" is crummy because it is a noun and verb. > "fold" is also crummy because it has too many slightly different > definitions out there. > > -Doug From brian.goetz at oracle.com Sun Dec 2 17:57:48 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 02 Dec 2012 20:57:48 -0500 Subject: Bikeshed opportunity: filter/map/reduce naming In-Reply-To: <6712820CB52CFB4D842561213A77C05404BE202F57@GSCMAMP09EX.firmwide.corp.gs.com> References: <50B940F4.50600@oracle.com> <50B95553.2000806@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404BE202F57@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50BC071C.4050902@oracle.com> What I'm hearing is: no one seems to have much problem with filter, map, or reduce. Ain't broke, don't fix. Going once, going twice... Next question: what about fold? What we're now calling reduce is a homogenous (TxT -> T) left-fold. What should we call the nonhomogeneous (UxT -> U) fold? Right now we're calling that "fold". On 12/2/2012 8:30 PM, Raab, Donald wrote: > Let it be said that Doug went to the Green Eggs and Ham before I did. IIRC GE&H was written with less than 50 words, so Dr. Seuss must have spent a lot of time getting his words just right. > > I think it is important to have a consistent "set" of names, with a single answer to "Where did that name come from?" GS Collections borrowed its initial set of collection names from Smalltalk (select, reject, collect, detect, injectInto). It should be trivial for Smalltalk, Ruby and Groovy developers to understand the protocol. Not everyone likes the names, but it saves potentially hundreds of hours of debate on individual bikeshed opportunities. ;-) > > If you choose select, you should probably choose collect. Reduce is different. You can achieve reduce with injectInto, but injectInto is more closely related to foldl I believe. So as Doug says, reduce should just be reduce. > > What would you like the Java names to be similar to Brian? > > Filter/Map/Reduce seems to be most similar to Python, Scala / Haskell / Lisp. I'm fine with any of these naming conventions. I do feel transform would be a better name for map in Java. Map is an overloaded term in Java. > > On the low percentage chance you go with select/collect, developers using GS Collections + Java Streams are going to either be happy or miserable. I can't tell which for sure. > >>> (No, Don, we're not going with the Dr. Seuss names. :) >>> >>> But, people have complained about filter because they can't tell >>> whether we are filtering OUT the elements matching the predicate, or >>> including them. Some of these people have suggested >>> "where(Predicate)" as an alternative. Which seems OK to me. >>> >> >> "select" is the most classic name. Especially for a database company :- >> ) Also "selectAny" etc. >> >> >>> Others find "map" too math-y. (The alternatives I can think of are >>> also math-y; project, transform, apply). >> >> Do they think that "java.util.Set" is too mathy? >> >>> >>> Further, "reduce" and "fold" are unfamiliar to many Java developers. >>> The .NET folks went with "aggregate" to describe their >> reduction/folding operations. >>> >> >> Stick with "reduce". Nothing else means the same thing with such >> clarity. >> "aggregate" is crummy because it is a noun and verb. >> "fold" is also crummy because it has too many slightly different >> definitions out there. >> >> -Doug > From sam at sampullara.com Sun Dec 2 18:16:35 2012 From: sam at sampullara.com (Sam Pullara) Date: Sun, 2 Dec 2012 18:16:35 -0800 Subject: Bikeshed opportunity: filter/map/reduce naming In-Reply-To: <50BC071C.4050902@oracle.com> References: <50B940F4.50600@oracle.com> <50B95553.2000806@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404BE202F57@GSCMAMP09EX.firmwide.corp.gs.com> <50BC071C.4050902@oracle.com> Message-ID: Thinking of Hadoop since people are starting to get familiar with that in the Java space: current reduce -> combine fold -> reduce Seems like that matches a little better. Sam On Dec 2, 2012, at 5:57 PM, Brian Goetz wrote: > What I'm hearing is: no one seems to have much problem with filter, map, or reduce. Ain't broke, don't fix. Going once, going twice... > > Next question: what about fold? What we're now calling reduce is a homogenous (TxT -> T) left-fold. What should we call the nonhomogeneous (UxT -> U) fold? Right now we're calling that "fold". > > On 12/2/2012 8:30 PM, Raab, Donald wrote: >> Let it be said that Doug went to the Green Eggs and Ham before I did. IIRC GE&H was written with less than 50 words, so Dr. Seuss must have spent a lot of time getting his words just right. >> >> I think it is important to have a consistent "set" of names, with a single answer to "Where did that name come from?" GS Collections borrowed its initial set of collection names from Smalltalk (select, reject, collect, detect, injectInto). It should be trivial for Smalltalk, Ruby and Groovy developers to understand the protocol. Not everyone likes the names, but it saves potentially hundreds of hours of debate on individual bikeshed opportunities. ;-) >> >> If you choose select, you should probably choose collect. Reduce is different. You can achieve reduce with injectInto, but injectInto is more closely related to foldl I believe. So as Doug says, reduce should just be reduce. >> >> What would you like the Java names to be similar to Brian? >> >> Filter/Map/Reduce seems to be most similar to Python, Scala / Haskell / Lisp. I'm fine with any of these naming conventions. I do feel transform would be a better name for map in Java. Map is an overloaded term in Java. >> >> On the low percentage chance you go with select/collect, developers using GS Collections + Java Streams are going to either be happy or miserable. I can't tell which for sure. >> >>>> (No, Don, we're not going with the Dr. Seuss names. :) >>>> >>>> But, people have complained about filter because they can't tell >>>> whether we are filtering OUT the elements matching the predicate, or >>>> including them. Some of these people have suggested >>>> "where(Predicate)" as an alternative. Which seems OK to me. >>>> >>> >>> "select" is the most classic name. Especially for a database company :- >>> ) Also "selectAny" etc. >>> >>> >>>> Others find "map" too math-y. (The alternatives I can think of are >>>> also math-y; project, transform, apply). >>> >>> Do they think that "java.util.Set" is too mathy? >>> >>>> >>>> Further, "reduce" and "fold" are unfamiliar to many Java developers. >>>> The .NET folks went with "aggregate" to describe their >>> reduction/folding operations. >>>> >>> >>> Stick with "reduce". Nothing else means the same thing with such >>> clarity. >>> "aggregate" is crummy because it is a noun and verb. >>> "fold" is also crummy because it has too many slightly different >>> definitions out there. >>> >>> -Doug >> From mike.duigou at oracle.com Tue Dec 4 21:47:20 2012 From: mike.duigou at oracle.com (Mike Duigou) Date: Tue, 4 Dec 2012 21:47:20 -0800 Subject: Request for Review : CR#8004015 : [2nd pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: References: Message-ID: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> Hello all; I have updated the proposed patch. The changes primarily add class and method documentation regarding handling of null for the primitive specializations. http://cr.openjdk.java.net/~mduigou/8004015/1/webrev/ http://cr.openjdk.java.net/~mduigou/8004015/1/specdiff/java/util/function/package-summary.html I've also reformatted the source for the default methods. Mike On Nov 26 2012, at 18:12 , Mike Duigou wrote: > Hello all; > > In the original patch which added the basic lambda functional interfaces, CR#8001634 [1], none of the interfaces extended other interfaces. The reason was primarily that the javac compiler did not, at the time that 8001634 was proposed, support extension methods. The compiler now supports adding of method defaults so this patch improves the functional interfaces by filing in the inheritance hierarchy. > > Adding the parent interfaces and default methods allows each functional interface to be used in more places. It is especially important for the functional interfaces which support primitive types, IntSupplier, IntFunction, IntUnaryOperator, IntBinaryOperator, etc. We expect that eventually standard implementations of these interfaces will be provided for functions like max, min, sum, etc. By extending the reference oriented functional interfaces such as Function, the primitive implementations can be used with the boxed primitive types along with the primitive types for which they are defined. > > The patch to add parent interfaces and default methods can be found here: > > http://cr.openjdk.java.net/~mduigou/8004015/0/webrev/ > http://cr.openjdk.java.net/~mduigou/8004015/0/specdiff/java/util/function/package-summary.html > > Mike > > [1] http://hg.openjdk.java.net/jdk8/tl/jdk/rev/c2e80176a697 From david.holmes at oracle.com Tue Dec 4 22:10:30 2012 From: david.holmes at oracle.com (David Holmes) Date: Wed, 05 Dec 2012 16:10:30 +1000 Subject: Request for Review : CR#8004015 : [2nd pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> Message-ID: <50BEE556.6090602@oracle.com> Hi Mike, In multiple places: + *

xxx ... Should that be

tag? Is it actually needed? (my javadoc is a bit rusty). Aside: I don't realise you could {@inheritDoc) as a simple text insertion mechanism. Just to be clear, the null-handling statements are intended to be normative and apply to anyone who might provide an implementation of theses classes - right? Thanks, David On 5/12/2012 3:47 PM, Mike Duigou wrote: > Hello all; > > I have updated the proposed patch. The changes primarily add class and method documentation regarding handling of null for the primitive specializations. > > http://cr.openjdk.java.net/~mduigou/8004015/1/webrev/ > http://cr.openjdk.java.net/~mduigou/8004015/1/specdiff/java/util/function/package-summary.html > > I've also reformatted the source for the default methods. > > Mike > > > On Nov 26 2012, at 18:12 , Mike Duigou wrote: > >> Hello all; >> >> In the original patch which added the basic lambda functional interfaces, CR#8001634 [1], none of the interfaces extended other interfaces. The reason was primarily that the javac compiler did not, at the time that 8001634 was proposed, support extension methods. The compiler now supports adding of method defaults so this patch improves the functional interfaces by filing in the inheritance hierarchy. >> >> Adding the parent interfaces and default methods allows each functional interface to be used in more places. It is especially important for the functional interfaces which support primitive types, IntSupplier, IntFunction, IntUnaryOperator, IntBinaryOperator, etc. We expect that eventually standard implementations of these interfaces will be provided for functions like max, min, sum, etc. By extending the reference oriented functional interfaces such as Function, the primitive implementations can be used with the boxed primitive types along with the primitive types for which they are defined. >> >> The patch to add parent interfaces and default methods can be found here: >> >> http://cr.openjdk.java.net/~mduigou/8004015/0/webrev/ >> http://cr.openjdk.java.net/~mduigou/8004015/0/specdiff/java/util/function/package-summary.html >> >> Mike >> >> [1] http://hg.openjdk.java.net/jdk8/tl/jdk/rev/c2e80176a697 > From dl at cs.oswego.edu Wed Dec 5 03:41:48 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 05 Dec 2012 06:41:48 -0500 Subject: Primitive streams and optional In-Reply-To: <50B7A2A6.1070905@univ-mlv.fr> References: <50AC00AC.2010600@oracle.com> <1A66BC18-FA91-4A0E-A258-3C2A594C1154@oracle.com> <69080D2A-CBD4-4CE7-A041-C550174216B8@oracle.com> <8CC6CED8-99F0-4618-9542-A1D2251713D4@oracle.com> <50B1002A.9010800@cs.oswego.edu> <50B3D45D.1080600@oracle.com> <50B62D4F.8010702@cs.oswego.edu> <50B634F5.3070501@univ-mlv.fr> <50B7827E.5050509@cs.oswego.edu> <50B7A2A6.1070905@univ-mlv.fr> Message-ID: <50BF32FC.4090204@cs.oswego.edu> On 11/29/12 13:00, Remi Forax wrote: > On 11/29/2012 04:42 PM, Doug Lea wrote: >> On 11/28/12 10:59, Remi Forax wrote: >>> On 11/28/2012 04:27 PM, Doug Lea wrote: >>>> On 11/26/12 15:43, Brian Goetz wrote: >>>> >>>>> 1. Ban nulls. This is equivalent to adding >>>>> .tee(e -> { if (e == null) throw new NPE(); } >>>>> between all stages of a pipeline. >>>>> >>>>> 2. Ignore nulls. This is what Doug is proposing, and is equivalent to adding >>>>> .filter(e -> e != null) >>>>> between all stages of a pipeline. >>>>> >>>>> 3. Tolerate nulls. This treat nulls as "just another value" and hopes that >>>>> lambdas and downstream stages can deal. >> >>>> (They do vary a little: #3 will sometimes be the most expensive, >>>> since the lack of a pre-null-check forces twistier code paths >>>> to be generated later on first dereference of a field.) >>> >>> Yes, for #3, ... And the costs for #3 increase each time you need to interpose the null object pattern internally to transform #3 into a variant of #2. As exemplified by the continuing set of patches for the continuing (and probably never-ending) list of cases noted on recent lambda-dev posts. I was the first person to use null-masking in java.util in an early contributed version of HashMap that remains the basis of current version. There was/is no better way out. But it adds runtime overhead and API complexity for ALL users to cope with an inadvisable usage that makes no sense anyway. This experience among others led to the non-nulls policy in j.u.c. -Doug From dl at cs.oswego.edu Wed Dec 5 04:39:09 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 05 Dec 2012 07:39:09 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute Message-ID: <50BF406D.5000805@cs.oswego.edu> One of the many reasons that this null stuff is driving me crazy is that I'm trying to make good on promises to help "elevate" the nice lambdaized methods added in ConcurrentHashMap up to at least ConcurrentMap and ideally to Map. But I don't know how to make plausible specs that say what happens with null keys and values. Pasted below is what I have for an adaptation of the most basic one, method Map.compute(). Three other similar methods computeIfAbsent(), computeIfPresent(), and merge() amount to special cases that are generally simpler to use and more efficient when they apply. After a few stabs at it, I think that the only sane way to spec and default-implement it is to say that, even if your Map implementation otherwise allows null keys/values, that this method will act as if it doesn't. Feel free to try fleshing it out yourself under different policies and see if you can come up with anything usable and humanly decodable. If you can, please let me know. /** * Computes a new mapping value given a key and its current mapped * value (or {@code null} if there is no current mapping). The * default implementation is equivalent to * *

 {@code
      *   value = remappingFunction.apply(key, map.get(key));
      *   if (value != null)
      *     map.put(key, value);
      *   else
      *     map.remove(key);
      * }
* * If the function returns {@code null}, the mapping is removed. * If the function itself throws an (unchecked) exception, the * exception is rethrown to its caller, and the current mapping is * left unchanged. For example, to either create or append new * messages to a value mapping: * *
 {@code
      * Map map = ...;
      * final String msg = ...;
      * map.compute(key, new BiFunction() {
      *   public String apply(Key k, String v) {
      *    return (v == null) ? msg : v + msg;});}}
* *

The default implementation makes no guarantees about * synchronization or atomicity properties of this method. Any * class overriding this method must specify its concurrency * properties. * * @param key key with which the specified value is to be associated * @param remappingFunction the function to compute a value * @return the new value associated with the specified key, or null if none * @throws NullPointerException if the specified key or remappingFunction * is null * @throws RuntimeException or Error if the remappingFunction does so, * in which case the mapping is unchanged */ V compute(K key, BiFunction remappingFunction); From forax at univ-mlv.fr Wed Dec 5 04:42:59 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 05 Dec 2012 13:42:59 +0100 Subject: Primitive streams and optional In-Reply-To: <50BF32FC.4090204@cs.oswego.edu> References: <50AC00AC.2010600@oracle.com> <1A66BC18-FA91-4A0E-A258-3C2A594C1154@oracle.com> <69080D2A-CBD4-4CE7-A041-C550174216B8@oracle.com> <8CC6CED8-99F0-4618-9542-A1D2251713D4@oracle.com> <50B1002A.9010800@cs.oswego.edu> <50B3D45D.1080600@oracle.com> <50B62D4F.8010702@cs.oswego.edu> <50B634F5.3070501@univ-mlv.fr> <50B7827E.5050509@cs.oswego.edu> <50B7A2A6.1070905@univ-mlv.fr> <50BF32FC.4090204@cs.oswego.edu> Message-ID: <50BF4153.5020005@univ-mlv.fr> On 12/05/2012 12:41 PM, Doug Lea wrote: > On 11/29/12 13:00, Remi Forax wrote: >> On 11/29/2012 04:42 PM, Doug Lea wrote: >>> On 11/28/12 10:59, Remi Forax wrote: >>>> On 11/28/2012 04:27 PM, Doug Lea wrote: >>>>> On 11/26/12 15:43, Brian Goetz wrote: >>>>> >>>>>> 1. Ban nulls. This is equivalent to adding >>>>>> .tee(e -> { if (e == null) throw new NPE(); } >>>>>> between all stages of a pipeline. >>>>>> >>>>>> 2. Ignore nulls. This is what Doug is proposing, and is >>>>>> equivalent to adding >>>>>> .filter(e -> e != null) >>>>>> between all stages of a pipeline. >>>>>> >>>>>> 3. Tolerate nulls. This treat nulls as "just another value" and >>>>>> hopes that >>>>>> lambdas and downstream stages can deal. >>> >>>>> (They do vary a little: #3 will sometimes be the most expensive, >>>>> since the lack of a pre-null-check forces twistier code paths >>>>> to be generated later on first dereference of a field.) >>>> >>>> Yes, for #3, ... > > And the costs for #3 increase each time you need to interpose the > null object pattern internally to transform #3 into a variant of #2. > As exemplified by the continuing set of patches for the > continuing (and probably never-ending) list of cases noted > on recent lambda-dev posts. The issue discussed in recent lambda-dev posts is how to pivot from a code that use null to say there is no value because null was not supported (so #1) to a code that use a constant NO_VALUE because now null is a supported value (so #3). If #2 was chosen, the code had to changed too. > > I was the first person to use null-masking in java.util > in an early contributed version of HashMap that remains > the basis of current version. There was/is no better way out. > But it adds runtime overhead and API complexity for ALL users > to cope with an inadvisable usage that makes no sense anyway. > This experience among others led to the non-nulls policy in j.u.c. If we can go back in time and disallow null for all collections, we will live in a better world, but that not the case. BTW, the policy used by j.u.c is not the good one too. The best policy should be, never uses null. I'm pretty sure that because j.u.c collections use null to tag entry that should be removed and because the VM aggressively optimizes value that was never null before, code that triggers de-optimization the first time you remove an element from a concurrent collection should exist. > > -Doug R?mi From dl at cs.oswego.edu Wed Dec 5 04:56:44 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 05 Dec 2012 07:56:44 -0500 Subject: Primitive streams and optional In-Reply-To: <50BF4153.5020005@univ-mlv.fr> References: <50AC00AC.2010600@oracle.com> <1A66BC18-FA91-4A0E-A258-3C2A594C1154@oracle.com> <69080D2A-CBD4-4CE7-A041-C550174216B8@oracle.com> <8CC6CED8-99F0-4618-9542-A1D2251713D4@oracle.com> <50B1002A.9010800@cs.oswego.edu> <50B3D45D.1080600@oracle.com> <50B62D4F.8010702@cs.oswego.edu> <50B634F5.3070501@univ-mlv.fr> <50B7827E.5050509@cs.oswego.edu> <50B7A2A6.1070905@univ-mlv.fr> <50BF32FC.4090204@cs.oswego.edu> <50BF4153.5020005@univ-mlv.fr> Message-ID: <50BF448C.7060002@cs.oswego.edu> On 12/05/12 07:42, Remi Forax wrote: > This experience among others led to the non-nulls policy in j.u.c. > > If we can go back in time and disallow null for all collections, we will live in > a better world, but that not the case. If we could go back in time and disallow null for all streams, we will live in a better world. Oh, wait! We don't have to go back in time! We can just make a better world! > I'm pretty sure that because j.u.c collections use null to tag entry that should > be removed and because the VM aggressively optimizes value that was never null > before, code that triggers de-optimization the first time you remove an element > from a concurrent collection should exist. I'm pretty sure that all such cases are OK because they trigger different branches that would somehow need encoding anyway. So branching on null vs branching on something else is a wash. (Although there might be a case here and there where different encodings might win because they currently disable triggerings of agressive-null mechanics that would otherwise be enabled along branches. -Doug From forax at univ-mlv.fr Wed Dec 5 05:19:24 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 05 Dec 2012 14:19:24 +0100 Subject: Primitive streams and optional In-Reply-To: <50BF448C.7060002@cs.oswego.edu> References: <50AC00AC.2010600@oracle.com> <1A66BC18-FA91-4A0E-A258-3C2A594C1154@oracle.com> <69080D2A-CBD4-4CE7-A041-C550174216B8@oracle.com> <8CC6CED8-99F0-4618-9542-A1D2251713D4@oracle.com> <50B1002A.9010800@cs.oswego.edu> <50B3D45D.1080600@oracle.com> <50B62D4F.8010702@cs.oswego.edu> <50B634F5.3070501@univ-mlv.fr> <50B7827E.5050509@cs.oswego.edu> <50B7A2A6.1070905@univ-mlv.fr> <50BF32FC.4090204@cs.oswego.edu> <50BF4153.5020005@univ-mlv.fr> <50BF448C.7060002@cs.oswego.edu> Message-ID: <50BF49DC.1040803@univ-mlv.fr> On 12/05/2012 01:56 PM, Doug Lea wrote: > On 12/05/12 07:42, Remi Forax wrote: >> This experience among others led to the non-nulls policy in j.u.c. >> >> If we can go back in time and disallow null for all collections, we >> will live in >> a better world, but that not the case. > > If we could go back in time and disallow null for all streams, we will > live in a better world. Oh, wait! We don't have to go back in time! > We can just make a better world! That was the first idea, but there is not a lot of streams that are not baked by a collection. If we don't tolerate null in stream, we create an island with nice rules inside the island and a hell for people that want to go to that island or go out of that island. And we don't want a million of devs to become number 6, right ? [1]. > > >> I'm pretty sure that because j.u.c collections use null to tag entry >> that should >> be removed and because the VM aggressively optimizes value that was >> never null >> before, code that triggers de-optimization the first time you remove >> an element >> from a concurrent collection should exist. > > I'm pretty sure that all such cases are OK because they trigger > different branches that would somehow need encoding anyway. > So branching on null vs branching on something else is a wash. > (Although there might be a case here and there where different > encodings might win because they currently disable triggerings > of agressive-null mechanics that would otherwise be enabled > along branches. > > > -Doug > > R?mi [1] https://en.wikipedia.org/wiki/The_Prisoner From forax at univ-mlv.fr Wed Dec 5 05:40:42 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 05 Dec 2012 14:40:42 +0100 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BF406D.5000805@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> Message-ID: <50BF4EDA.1040801@univ-mlv.fr> You can distinguish between the fact that null is stored or not by using a dedicated functional interface that send if the value is present or not as parameter. For the return value, you can use a special value for saying NO_VALUE. interface MapFunction { <-- better name needed Object NO_VALUE = new Object(); // returns K| NO_VALUE public Object apply(K key, boolean isPresent, String v); } As you see the way to specify the return type is ugly and unsafe. Now, given that people are used to Map.get() returning null, I think it's better to have apply returning a K with your proposed semantics. R?mi On 12/05/2012 01:39 PM, Doug Lea wrote: > > One of the many reasons that this null stuff is driving me crazy is > that I'm trying to make good on promises to help "elevate" > the nice lambdaized methods added in ConcurrentHashMap up > to at least ConcurrentMap and ideally to Map. But I don't > know how to make plausible specs that say what happens > with null keys and values. > > Pasted below is what I have for an adaptation of the most > basic one, method Map.compute(). Three other similar methods > computeIfAbsent(), computeIfPresent(), and merge() > amount to special cases that are generally simpler to > use and more efficient when they apply. > > After a few stabs at it, I think that the only sane way > to spec and default-implement it is to say that, even > if your Map implementation otherwise allows null keys/values, > that this method will act as if it doesn't. Feel free > to try fleshing it out yourself under different policies > and see if you can come up with anything usable and > humanly decodable. If you can, please let me know. > > > /** > * Computes a new mapping value given a key and its current mapped > * value (or {@code null} if there is no current mapping). The > * default implementation is equivalent to > * > *

 {@code
>      *   value = remappingFunction.apply(key, map.get(key));
>      *   if (value != null)
>      *     map.put(key, value);
>      *   else
>      *     map.remove(key);
>      * }
> * > * If the function returns {@code null}, the mapping is removed. > * If the function itself throws an (unchecked) exception, the > * exception is rethrown to its caller, and the current mapping is > * left unchanged. For example, to either create or append new > * messages to a value mapping: > * > *
 {@code
>      * Map map = ...;
>      * final String msg = ...;
>      * map.compute(key, new BiFunction() {
>      *   public String apply(Key k, String v) {
>      *    return (v == null) ? msg : v + msg;});}}
> * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method. Any > * class overriding this method must specify its concurrency > * properties. > * > * @param key key with which the specified value is to be associated > * @param remappingFunction the function to compute a value > * @return the new value associated with the specified key, or > null if none > * @throws NullPointerException if the specified key or > remappingFunction > * is null > * @throws RuntimeException or Error if the remappingFunction does > so, > * in which case the mapping is unchanged > */ > V compute(K key, > BiFunction > remappingFunction); From forax at univ-mlv.fr Wed Dec 5 06:33:02 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 05 Dec 2012 15:33:02 +0100 Subject: Remove cumulate from Stream interface Message-ID: <50BF5B1E.5060600@univ-mlv.fr> I maybe wrong but there is a simple way to implement cumulate() using map(), so I'm not sure cumulate pull its own weight. R?mi public final Stream cumulate(final BinaryOperator operator) { return map(new Mapper() { private Object accumulator = NO_VALUE; @Override public U map(U element) { Object acc = accumulator; if (acc == NO_VALUE) { return element; } acc = operator.operate((U)acc, element); accumulator = acc; return (U)acc; } }); } From forax at univ-mlv.fr Wed Dec 5 06:38:42 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 05 Dec 2012 15:38:42 +0100 Subject: skip, limit and slice Message-ID: <50BF5C72.4090006@univ-mlv.fr> skip and limit can be written using slice(), limit(n) => slice(0, n) skip(n) => slice(n, Long.MAX_VALUE) so there are not strictly needed. Given that limit() is a known idiom, may be only limit() and skip() should be kept with the default implementation of limit() calling slice(0, limit). cheers, R?mi From brian.goetz at oracle.com Wed Dec 5 07:02:29 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 05 Dec 2012 10:02:29 -0500 Subject: Primitive streams and optional In-Reply-To: <50BF32FC.4090204@cs.oswego.edu> References: <50AC00AC.2010600@oracle.com> <1A66BC18-FA91-4A0E-A258-3C2A594C1154@oracle.com> <69080D2A-CBD4-4CE7-A041-C550174216B8@oracle.com> <8CC6CED8-99F0-4618-9542-A1D2251713D4@oracle.com> <50B1002A.9010800@cs.oswego.edu> <50B3D45D.1080600@oracle.com> <50B62D4F.8010702@cs.oswego.edu> <50B634F5.3070501@univ-mlv.fr> <50B7827E.5050509@cs.oswego.edu> <50B7A2A6.1070905@univ-mlv.fr> <50BF32FC.4090204@cs.oswego.edu> Message-ID: <50BF6205.4080503@oracle.com> >>>>>> 3. Tolerate nulls. This treat nulls as "just another value" and I think there may be some confusion due to the way the word "tolerance" has morphed in recent years? By "tolerate", we don't mean "all men hug and sing a song of unity", we mean "let's not have open violence in the streets." Perhaps we should call this "barely tolerate." As Doug said, the key casualty of this is modular reasoning about what will happen in various cases when there are nulls. I view this as an acceptable cost, and one borne entirely by the null-lovers. When we achieve a better world in which people don't put nulls in collections, it will become a purely theoretical concern. >>>>>> hopes that >>>>>> lambdas and downstream stages can deal. The key is "hope". We don't guarantee that all stages *can* deal, nor do we bend over backwards to accommodate. For example, the semantics of findAny/First are (as currently written) pretty much incompatible with null-bearing streams. The return type is Optional where Optional can either represent the absence of value or a present non-null value. So I think it is perfectly reasonble to say "if the stream contains nulls, this may throw NPE." > As exemplified by the continuing set of patches for the > continuing (and probably never-ending) list of cases noted > on recent lambda-dev posts. For many of these, I'm perfectly willing to specify that behavior in the presence of nulls is simply unpredictable; we're not necessarily patching things because we think they're bugs, so much as we're still at the point where patching is very cheap. "Using null here may void your warranty". From brian.goetz at oracle.com Wed Dec 5 07:05:24 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 05 Dec 2012 10:05:24 -0500 Subject: Remove cumulate from Stream interface In-Reply-To: <50BF5B1E.5060600@univ-mlv.fr> References: <50BF5B1E.5060600@univ-mlv.fr> Message-ID: <50BF62B4.7080406@oracle.com> Only if you don't care about parallel, and the whole value of cumulate is that prefix shows up everywhere in parallel algorithms. Plus, your Mapper will violate the to-be-written specs about statefulness/side-effects in lambdas passed to functional stream methods. On 12/5/2012 9:33 AM, Remi Forax wrote: > I maybe wrong but there is a simple way to implement cumulate() using > map(), > so I'm not sure cumulate pull its own weight. > > R?mi > > public final Stream cumulate(final BinaryOperator operator) { > return map(new Mapper() { > private Object accumulator = NO_VALUE; > > @Override > public U map(U element) { > Object acc = accumulator; > if (acc == NO_VALUE) { > return element; > } > acc = operator.operate((U)acc, element); > accumulator = acc; > return (U)acc; > } > }); > } From brian.goetz at oracle.com Wed Dec 5 07:06:22 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 05 Dec 2012 10:06:22 -0500 Subject: skip, limit and slice In-Reply-To: <50BF5C72.4090006@univ-mlv.fr> References: <50BF5C72.4090006@univ-mlv.fr> Message-ID: <50BF62EE.8050009@oracle.com> Correct. If you look at the implementation: @Override public Stream limit(long limit) { return pipeline(new SliceOp(0, limit)); } @Override public Stream skip(long toSkip) { return pipeline(new SliceOp(toSkip)); } @Override public Stream slice(long skip, long limit) { return pipeline(new SliceOp(skip, limit)); } they are strictly for convenience. On 12/5/2012 9:38 AM, Remi Forax wrote: > skip and limit can be written using slice(), > limit(n) => slice(0, n) > skip(n) => slice(n, Long.MAX_VALUE) > > so there are not strictly needed. > Given that limit() is a known idiom, may be only limit() and skip() > should be kept with the default implementation of limit() calling > slice(0, limit). > > cheers, > R?mi > > From dl at cs.oswego.edu Wed Dec 5 07:13:32 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 05 Dec 2012 10:13:32 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BF4EDA.1040801@univ-mlv.fr> References: <50BF406D.5000805@cs.oswego.edu> <50BF4EDA.1040801@univ-mlv.fr> Message-ID: <50BF649C.2050605@cs.oswego.edu> On 12/05/12 08:40, Remi Forax wrote: > You can distinguish between the fact that null is stored or not by using a > dedicated functional interface that send if the value is present or not as > parameter. For the return value, you can use a special value for saying NO_VALUE. > Right. This gets very low scores on usability/decodability. But the big problem is that each concrete Map class itself would need to supply something like this, because only it knows if it accepts null values. Which precludes default implementations. Well, you could default-implement this as well, but in that case you might as well take the plain default implementation in my version and let any class that wants to do differently re-spec their override using some escape-hatch wording. (The same sort of odd escape-hatch wording that is used for example in IdentityHashMap telling readers to disregard everything the Map specs say about method equals().) -Doug > interface MapFunction { <-- better name needed > Object NO_VALUE = new Object(); > > // returns K| NO_VALUE > public Object apply(K key, boolean isPresent, String v); > } > > As you see the way to specify the return type is ugly and unsafe. > > Now, given that people are used to Map.get() returning null, I think it's better > to have apply returning a K with your proposed semantics. > From tim at peierls.net Wed Dec 5 07:22:58 2012 From: tim at peierls.net (Tim Peierls) Date: Wed, 5 Dec 2012 10:22:58 -0500 Subject: Primitive streams and optional In-Reply-To: <50BF6205.4080503@oracle.com> References: <50AC00AC.2010600@oracle.com> <1A66BC18-FA91-4A0E-A258-3C2A594C1154@oracle.com> <69080D2A-CBD4-4CE7-A041-C550174216B8@oracle.com> <8CC6CED8-99F0-4618-9542-A1D2251713D4@oracle.com> <50B1002A.9010800@cs.oswego.edu> <50B3D45D.1080600@oracle.com> <50B62D4F.8010702@cs.oswego.edu> <50B634F5.3070501@univ-mlv.fr> <50B7827E.5050509@cs.oswego.edu> <50B7A2A6.1070905@univ-mlv.fr> <50BF32FC.4090204@cs.oswego.edu> <50BF6205.4080503@oracle.com> Message-ID: Agree with Brian on all points below. On Wed, Dec 5, 2012 at 10:02 AM, Brian Goetz wrote: > 3. Tolerate nulls. This treat nulls as "just another value" and >>>>>>> >>>>>> > I think there may be some confusion due to the way the word "tolerance" > has morphed in recent years? By "tolerate", we don't mean "all men hug and > sing a song of unity", we mean "let's not have open violence in the > streets." Perhaps we should call this "barely tolerate." As Doug said, > the key casualty of this is modular reasoning about what will happen in > various cases when there are nulls. I view this as an acceptable cost, and > one borne entirely by the null-lovers. When we achieve a better world in > which people don't put nulls in collections, it will become a purely > theoretical concern. > > > hopes that >>>>>>> lambdas and downstream stages can deal. >>>>>>> >>>>>> > The key is "hope". We don't guarantee that all stages *can* deal, nor do > we bend over backwards to accommodate. > > For example, the semantics of findAny/First are (as currently written) > pretty much incompatible with null-bearing streams. The return type is > Optional where Optional can either represent the absence of value or a > present non-null value. So I think it is perfectly reasonble to say "if > the stream contains nulls, this may throw NPE." > > > As exemplified by the continuing set of patches for the >> continuing (and probably never-ending) list of cases noted >> on recent lambda-dev posts. >> > > For many of these, I'm perfectly willing to specify that behavior in the > presence of nulls is simply unpredictable; we're not necessarily patching > things because we think they're bugs, so much as we're still at the point > where patching is very cheap. > > "Using null here may void your warranty". > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121205/cadd3a6d/attachment.html From forax at univ-mlv.fr Wed Dec 5 07:33:43 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 05 Dec 2012 16:33:43 +0100 Subject: Stream.toArray() In-Reply-To: <505A2D81.3050306@oracle.com> References: <505A2D81.3050306@oracle.com> Message-ID: <50BF6957.90202@univ-mlv.fr> Restarting a thread that ends without clear winner. Currently, Stream.toArray() is specified as: Object[] toArray() which is not what users want, given the lack of reified generics and the fact that it's usually hard for a user to predict the number of elements of a Stream, the best signature seems to be: A[] toArray(Class arrayClass) with arrayClass.isArray() returning true and arrayClass.getComponentType().isPrimitive() returning false (or if you prefer Object[].class.isAssignableFrom(arrayClass) returning true) example of usage, Person[] coolPersons = persons.stream().filter(person#isCool()).toArray(Person[].class); cheers, R?mi From brian.goetz at oracle.com Wed Dec 5 07:39:56 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 05 Dec 2012 10:39:56 -0500 Subject: Stream.toArray() In-Reply-To: <50BF6957.90202@univ-mlv.fr> References: <505A2D81.3050306@oracle.com> <50BF6957.90202@univ-mlv.fr> Message-ID: <50BF6ACC.9070001@oracle.com> Agree on the general form -- toArray(clazz) is definitely better than the current bad alternatives offered by Collection. I prefer that the argument be the component class, not the array class. I think toArray(Foo.class) is far more natural to users than toArray(Foo[].class). On 12/5/2012 10:33 AM, Remi Forax wrote: > Restarting a thread that ends without clear winner. > > Currently, Stream.toArray() is specified as: > Object[] toArray() > > which is not what users want, given the lack of reified generics and the > fact that it's usually hard for a user to predict the number of elements > of a Stream, > the best signature seems to be: > A[] toArray(Class arrayClass) > with arrayClass.isArray() returning true and > arrayClass.getComponentType().isPrimitive() returning false > (or if you prefer Object[].class.isAssignableFrom(arrayClass) returning > true) > > example of usage, > Person[] coolPersons = > persons.stream().filter(person#isCool()).toArray(Person[].class); > > cheers, > R?mi > From david.lloyd at redhat.com Wed Dec 5 07:44:39 2012 From: david.lloyd at redhat.com (David M. Lloyd) Date: Wed, 05 Dec 2012 09:44:39 -0600 Subject: Stream.toArray() In-Reply-To: <50BF6ACC.9070001@oracle.com> References: <505A2D81.3050306@oracle.com> <50BF6957.90202@univ-mlv.fr> <50BF6ACC.9070001@oracle.com> Message-ID: <50BF6BE7.1060002@redhat.com> Agreed on the array class - another reason is that otherwise folks might expect this to work: int[] foo = stream.toArray(int[].class); On 12/05/2012 09:39 AM, Brian Goetz wrote: > Agree on the general form -- toArray(clazz) is definitely better than > the current bad alternatives offered by Collection. > > I prefer that the argument be the component class, not the array class. > I think toArray(Foo.class) is far more natural to users than > toArray(Foo[].class). > > On 12/5/2012 10:33 AM, Remi Forax wrote: >> Restarting a thread that ends without clear winner. >> >> Currently, Stream.toArray() is specified as: >> Object[] toArray() >> >> which is not what users want, given the lack of reified generics and the >> fact that it's usually hard for a user to predict the number of elements >> of a Stream, >> the best signature seems to be: >> A[] toArray(Class arrayClass) >> with arrayClass.isArray() returning true and >> arrayClass.getComponentType().isPrimitive() returning false >> (or if you prefer Object[].class.isAssignableFrom(arrayClass) returning >> true) >> >> example of usage, >> Person[] coolPersons = >> persons.stream().filter(person#isCool()).toArray(Person[].class); >> >> cheers, >> R?mi >> -- - DML From forax at univ-mlv.fr Wed Dec 5 08:43:38 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 05 Dec 2012 17:43:38 +0100 Subject: Stream.toArray() In-Reply-To: <50BF6BE7.1060002@redhat.com> References: <505A2D81.3050306@oracle.com> <50BF6957.90202@univ-mlv.fr> <50BF6ACC.9070001@oracle.com> <50BF6BE7.1060002@redhat.com> Message-ID: <50BF79BA.10306@univ-mlv.fr> On 12/05/2012 04:44 PM, David M. Lloyd wrote: > Agreed on the array class - another reason is that otherwise folks > might expect this to work: > > int[] foo = stream.toArray(int[].class); and they will never expect this to work ? int[] foo = stream.toArray(int.class); I've used an array class because this is what you want an instance of an array class, so I've tried: public A asArray(Class clazz); to reject toArray(int[]) but for a reason that I don't understand you can not use Object[] as bound. R?mi > > On 12/05/2012 09:39 AM, Brian Goetz wrote: >> Agree on the general form -- toArray(clazz) is definitely better than >> the current bad alternatives offered by Collection. >> >> I prefer that the argument be the component class, not the array class. >> I think toArray(Foo.class) is far more natural to users than >> toArray(Foo[].class). >> >> On 12/5/2012 10:33 AM, Remi Forax wrote: >>> Restarting a thread that ends without clear winner. >>> >>> Currently, Stream.toArray() is specified as: >>> Object[] toArray() >>> >>> which is not what users want, given the lack of reified generics and >>> the >>> fact that it's usually hard for a user to predict the number of >>> elements >>> of a Stream, >>> the best signature seems to be: >>> A[] toArray(Class arrayClass) >>> with arrayClass.isArray() returning true and >>> arrayClass.getComponentType().isPrimitive() returning false >>> (or if you prefer Object[].class.isAssignableFrom(arrayClass) returning >>> true) >>> >>> example of usage, >>> Person[] coolPersons = >>> persons.stream().filter(person#isCool()).toArray(Person[].class); >>> >>> cheers, >>> R?mi >>> > > From david.lloyd at redhat.com Wed Dec 5 09:28:22 2012 From: david.lloyd at redhat.com (David M. Lloyd) Date: Wed, 05 Dec 2012 11:28:22 -0600 Subject: Stream.toArray() In-Reply-To: <50BF79BA.10306@univ-mlv.fr> References: <505A2D81.3050306@oracle.com> <50BF6957.90202@univ-mlv.fr> <50BF6ACC.9070001@oracle.com> <50BF6BE7.1060002@redhat.com> <50BF79BA.10306@univ-mlv.fr> Message-ID: <50BF8436.9010609@redhat.com> On 12/05/2012 10:43 AM, Remi Forax wrote: > On 12/05/2012 04:44 PM, David M. Lloyd wrote: >> Agreed on the array class - another reason is that otherwise folks >> might expect this to work: >> >> int[] foo = stream.toArray(int[].class); > > and they will never expect this to work ? > int[] foo = stream.toArray(int.class); No, because the compiler will complain in this case, whereas in the former case, the failure will happen at runtime. Though to be fair, in the latter case the compiler would still accept this incorrect input: Integer[] foo = stream.toArray(int.class); > I've used an array class because this is what you want an instance of an > array class, so I've tried: > public A asArray(Class clazz); > to reject toArray(int[]) but for a reason that I don't understand you > can not use Object[] as bound. > > R?mi > >> >> On 12/05/2012 09:39 AM, Brian Goetz wrote: >>> Agree on the general form -- toArray(clazz) is definitely better than >>> the current bad alternatives offered by Collection. >>> >>> I prefer that the argument be the component class, not the array class. >>> I think toArray(Foo.class) is far more natural to users than >>> toArray(Foo[].class). >>> >>> On 12/5/2012 10:33 AM, Remi Forax wrote: >>>> Restarting a thread that ends without clear winner. >>>> >>>> Currently, Stream.toArray() is specified as: >>>> Object[] toArray() >>>> >>>> which is not what users want, given the lack of reified generics and >>>> the >>>> fact that it's usually hard for a user to predict the number of >>>> elements >>>> of a Stream, >>>> the best signature seems to be: >>>> A[] toArray(Class arrayClass) >>>> with arrayClass.isArray() returning true and >>>> arrayClass.getComponentType().isPrimitive() returning false >>>> (or if you prefer Object[].class.isAssignableFrom(arrayClass) returning >>>> true) >>>> >>>> example of usage, >>>> Person[] coolPersons = >>>> persons.stream().filter(person#isCool()).toArray(Person[].class); >>>> >>>> cheers, >>>> R?mi >>>> >> >> > -- - DML From mike.duigou at oracle.com Wed Dec 5 10:07:54 2012 From: mike.duigou at oracle.com (Mike Duigou) Date: Wed, 5 Dec 2012 10:07:54 -0800 Subject: Request for Review : CR#8004015 : [2nd pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: <50BEE556.6090602@oracle.com> References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> <50BEE556.6090602@oracle.com> Message-ID: <56B46C81-9364-4329-B373-80D1D80B295A@oracle.com> On Dec 4 2012, at 22:10 , David Holmes wrote: > Hi Mike, > > In multiple places: > > + *

xxx ... > > Should that be

tag? Is it actually needed? (my javadoc is a bit rusty). Many of these were added/changed by NetBeans styler. I then added additional instances. I have converted all of the

->

. I have also filed a bug against NetBans styler: http://netbeans.org/bugzilla/show_bug.cgi?id=223342 > Aside: I don't realise you could {@inheritDoc) as a simple text insertion mechanism. I only learned of this in the last six months myself. :-) > Just to be clear, the null-handling statements are intended to be normative and apply to anyone who might provide an implementation of theses classes - right? Correct. I would prefer that they were not but it seems unavoidable. Mike > > Thanks, > David > > On 5/12/2012 3:47 PM, Mike Duigou wrote: >> Hello all; >> >> I have updated the proposed patch. The changes primarily add class and method documentation regarding handling of null for the primitive specializations. >> >> http://cr.openjdk.java.net/~mduigou/8004015/1/webrev/ >> http://cr.openjdk.java.net/~mduigou/8004015/1/specdiff/java/util/function/package-summary.html >> >> I've also reformatted the source for the default methods. >> >> Mike >> >> >> On Nov 26 2012, at 18:12 , Mike Duigou wrote: >> >>> Hello all; >>> >>> In the original patch which added the basic lambda functional interfaces, CR#8001634 [1], none of the interfaces extended other interfaces. The reason was primarily that the javac compiler did not, at the time that 8001634 was proposed, support extension methods. The compiler now supports adding of method defaults so this patch improves the functional interfaces by filing in the inheritance hierarchy. >>> >>> Adding the parent interfaces and default methods allows each functional interface to be used in more places. It is especially important for the functional interfaces which support primitive types, IntSupplier, IntFunction, IntUnaryOperator, IntBinaryOperator, etc. We expect that eventually standard implementations of these interfaces will be provided for functions like max, min, sum, etc. By extending the reference oriented functional interfaces such as Function, the primitive implementations can be used with the boxed primitive types along with the primitive types for which they are defined. >>> >>> The patch to add parent interfaces and default methods can be found here: >>> >>> http://cr.openjdk.java.net/~mduigou/8004015/0/webrev/ >>> http://cr.openjdk.java.net/~mduigou/8004015/0/specdiff/java/util/function/package-summary.html >>> >>> Mike >>> >>> [1] http://hg.openjdk.java.net/jdk8/tl/jdk/rev/c2e80176a697 >> From mike.duigou at oracle.com Wed Dec 5 10:20:26 2012 From: mike.duigou at oracle.com (Mike Duigou) Date: Wed, 5 Dec 2012 10:20:26 -0800 Subject: Request for Review : CR#8004015 : [final (?) pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> Message-ID: I have updated webrev again to fix some reported javadoc technical issues and added null handling specification to the {Int|Double|Long}Supplier. http://cr.openjdk.java.net/~mduigou/8004015/2/webrev/ http://cr.openjdk.java.net/~mduigou/8004015/2/specdiff/java/util/function/package-summary.html I believe that this iteration is complete (or very nearly so). Mike On Dec 4 2012, at 21:47 , Mike Duigou wrote: > Hello all; > > I have updated the proposed patch. The changes primarily add class and method documentation regarding handling of null for the primitive specializations. > > http://cr.openjdk.java.net/~mduigou/8004015/1/webrev/ > http://cr.openjdk.java.net/~mduigou/8004015/1/specdiff/java/util/function/package-summary.html > > I've also reformatted the source for the default methods. > > Mike > > > On Nov 26 2012, at 18:12 , Mike Duigou wrote: > >> Hello all; >> >> In the original patch which added the basic lambda functional interfaces, CR#8001634 [1], none of the interfaces extended other interfaces. The reason was primarily that the javac compiler did not, at the time that 8001634 was proposed, support extension methods. The compiler now supports adding of method defaults so this patch improves the functional interfaces by filing in the inheritance hierarchy. >> >> Adding the parent interfaces and default methods allows each functional interface to be used in more places. It is especially important for the functional interfaces which support primitive types, IntSupplier, IntFunction, IntUnaryOperator, IntBinaryOperator, etc. We expect that eventually standard implementations of these interfaces will be provided for functions like max, min, sum, etc. By extending the reference oriented functional interfaces such as Function, the primitive implementations can be used with the boxed primitive types along with the primitive types for which they are defined. >> >> The patch to add parent interfaces and default methods can be found here: >> >> http://cr.openjdk.java.net/~mduigou/8004015/0/webrev/ >> http://cr.openjdk.java.net/~mduigou/8004015/0/specdiff/java/util/function/package-summary.html >> >> Mike >> >> [1] http://hg.openjdk.java.net/jdk8/tl/jdk/rev/c2e80176a697 > From forax at univ-mlv.fr Wed Dec 5 10:47:32 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 05 Dec 2012 19:47:32 +0100 Subject: Remove cumulate from Stream interface In-Reply-To: <50BF62B4.7080406@oracle.com> References: <50BF5B1E.5060600@univ-mlv.fr> <50BF62B4.7080406@oracle.com> Message-ID: <50BF96C4.1020406@univ-mlv.fr> On 12/05/2012 04:05 PM, Brian Goetz wrote: > Only if you don't care about parallel, damn it, my cover is ruined :) > and the whole value of cumulate is that prefix shows up everywhere in > parallel algorithms. > > Plus, your Mapper will violate the to-be-written specs about > statefulness/side-effects in lambdas passed to functional stream methods. Do you really want this overly restrictive wording for streams that are sequential ? I find this unrealistic, even if you try to specify this in the doc, nobody read the doc if not forced. Given that will not be enforced in the code, it will be only true on the paper. R?mi > > On 12/5/2012 9:33 AM, Remi Forax wrote: >> I maybe wrong but there is a simple way to implement cumulate() using >> map(), >> so I'm not sure cumulate pull its own weight. >> >> R?mi >> >> public final Stream cumulate(final BinaryOperator operator) { >> return map(new Mapper() { >> private Object accumulator = NO_VALUE; >> >> @Override >> public U map(U element) { >> Object acc = accumulator; >> if (acc == NO_VALUE) { >> return element; >> } >> acc = operator.operate((U)acc, element); >> accumulator = acc; >> return (U)acc; >> } >> }); >> } From brian.goetz at oracle.com Wed Dec 5 10:55:44 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 05 Dec 2012 13:55:44 -0500 Subject: Remove cumulate from Stream interface In-Reply-To: <50BF96C4.1020406@univ-mlv.fr> References: <50BF5B1E.5060600@univ-mlv.fr> <50BF62B4.7080406@oracle.com> <50BF96C4.1020406@univ-mlv.fr> Message-ID: <50BF98B0.3070401@oracle.com> >> Plus, your Mapper will violate the to-be-written specs about >> statefulness/side-effects in lambdas passed to functional stream methods. > > Do you really want this overly restrictive wording for streams that are > sequential ? > I find this unrealistic, even if you try to specify this in the doc, > nobody read the doc if not forced. > Given that will not be enforced in the code, it will be only true on the > paper. It may be unrealistic, but we have to do it anyway. If someone passes a Function to map() that mutates the collection source, all bets are going to be off, and we have to say this. The spec can characterize when things are guaranteed to work; if people do things that accidentally work because they are operating in a restrictive environment, that's their concern. From dl at cs.oswego.edu Wed Dec 5 13:10:13 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 05 Dec 2012 16:10:13 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BF406D.5000805@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> Message-ID: <50BFB835.6090707@cs.oswego.edu> On 12/05/12 07:39, Doug Lea wrote: > ... method Map.compute(). Three other similar methods > computeIfAbsent(), computeIfPresent(), and merge() > amount to special cases that are generally simpler to > use and more efficient when they apply. > > After a few stabs at it, I think that the only sane way > to spec and default-implement it is to say that, even > if your Map implementation otherwise allows null keys/values, > that this method will act as if it doesn't. Since people seem not to mind this (and Brian's and Tim's responses on other thread seem to even encourage it), the set of all four are below. Please skim through and see if you still agree. Brian: We had discussed also defining and default-implementing in Map the other ConcurrentMap methods: V putIfAbsent(K key, V value); boolean remove(Object key, Object value); boolean replace(K key, V oldValue, V newValue); V replace(K key, V value); Any thoughts? .... Additions to interface Map (with actual default implementations elided for now): /** * Attempts to compute a mapping given a key and its current * mapped value (or {@code null} if there is no current * mapping). The default implementation is equivalent to * *

 {@code
      *   V value = remappingFunction.apply(key, map.get(key));
      *   if (value != null)
      *     map.put(key, value);
      *   else
      *     map.remove(key);
      * }
* * If the function returns {@code null}, the mapping is removed * (or remains absent if initially absent). If the function * itself throws an (unchecked) exception, the exception is * rethrown to its caller, and the current mapping is left * unchanged. For example, to either create or append new * messages to a value mapping: * *
 {@code
      * Map map = ...;
      * final String msg = ...;
      * map.compute(key, (k, v, msg) => (v == null) ? msg : v + msg)
* *

The default implementation makes no guarantees about * synchronization or atomicity properties of this method. Any * class overriding this method must specify its concurrency * properties. * * @param key key with which the specified value is to be associated * @param remappingFunction the function to compute a value * @return the new value associated with the specified key, or null if none * @throws NullPointerException if the specified key is null and * this map does not support null keys, or the * remappingFunction is null * @throws RuntimeException or Error if the remappingFunction does so, * in which case the mapping is unchanged */ V compute(K key, BiFunction remappingFunction); /** * If the specified key is not already associated with a value, * attempts to compute its value using the given mappingFunction * and enters it into the map unless null. This is equivalent to: * *

 {@code
      * if (map.containsKey(key))
      *   return map.get(key);
      * value = mappingFunction.apply(key);
      * if (value != null)
      *   map.put(key, value);
      * return value;}
* * If the function returns {@code null} no mapping is recorded. If * the function itself throws an (unchecked) exception, the * exception is rethrown to its caller, and no mapping is * recorded. The most common usage is to construct a new object * serving as an initial mapped value, or memoized result, as in: * *
 {@code
      * map.computeIfAbsent(key, k -> new Value(f(k)));} 
* *

The default implementation makes no guarantees about * synchronization or atomicity properties of this method. Any * class overriding this method must specify its concurrency * properties. * * @param key key with which the specified value is to be associated * @param mappingFunction the function to compute a value * @return the current (existing or computed) value associated with * the specified key, or null if the computed value is null * @throws NullPointerException if the specified key is null and * this map does not support null keys, or the * mappingFunction is null * @throws RuntimeException or Error if the mappingFunction does so, * in which case the mapping is left unestablished */ V computeIfAbsent(K key, Function mappingFunction); /** * If the given key is present, attempts to compute a new mapping * given the key and its current mapped value. This is equivalent * to: * *

 {@code
      *   if (map.containsKey(key)) {
      *     value = remappingFunction.apply(key, map.get(key));
      *     if (value != null)
      *       map.put(key, value);
      *     else
      *       map.remove(key);
      *   }
      * }
* * If the function returns {@code null}, the mapping is removed * (or remains absent if initially absent). If the function * itself throws an (unchecked) exception, the exception is * rethrown to its caller, and the current mapping is left * unchanged. * *

The default implementation makes no guarantees about * synchronization or atomicity properties of this method. Any * class overriding this method must specify its concurrency * properties. * * @param key key with which the specified value is to be associated * @param remappingFunction the function to compute a value * @return the new value associated with the specified key, or null if none * @throws NullPointerException if the specified key is null and * this map does not support null keys, or the * remappingFunction is null * @throws RuntimeException or Error if the remappingFunction does so, * in which case the mapping is unchanged */ computeIfPresent(K key, BiFunction remappingFunction); /** * If the specified key is not already associated with a value, * associates it with the given value. Otherwise, replaces the * value with the results of the given remapping function, or * removes if {@code null}. This is equivalent to: * *

 {@code
      *   V newValue;
      *   if (!map.containsKey(key))
      *     newValue = value;
      *   else
      *     newValue = remappingFunction.apply(map.get(key), value);
      *   if (newValue != null)
      *     map.put(key, newValue);
      *   else
      *     map.remove(key);
      * }
* * If the function returns {@code null}, the mapping is removed * (or remains absent if initially absent). If the function * itself throws an (unchecked) exception, the exception is * rethrown to its caller, and the current mapping is left * unchanged. * *

The default implementation makes no guarantees about * synchronization or atomicity properties of this method. Any * class overriding this method must specify its concurrency * properties. * * @param key key with which the specified value is to be associated * @param value the value to use if absent * @param remappingFunction the function to recompute a value if present * @return the new value associated with the specified key, or null if none * @throws NullPointerException if the specified key is null and * this map does not support null keys, or the * remappingFunction is null * @throws RuntimeException or Error if the remappingFunction does so, * in which case the mapping is unchanged */ V merge(K key, V value, BiFunction remappingFunction); From mike.duigou at oracle.com Wed Dec 5 13:25:04 2012 From: mike.duigou at oracle.com (Mike Duigou) Date: Wed, 5 Dec 2012 13:25:04 -0800 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BFB835.6090707@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> Message-ID: The problem is that you can't write an atomic putIfAbsent default method in terms of the existing Map API. Thus far we've only contemplated defaults that can match any atomicity expectations provided by the non-default methods. public default boolean isEmpty() { return size() != 0; } is just fine but it's hard find mutation operations which benefit from default methods. Mike On Dec 5 2012, at 13:10 , Doug Lea wrote: > On 12/05/12 07:39, Doug Lea wrote: > >> ... method Map.compute(). Three other similar methods >> computeIfAbsent(), computeIfPresent(), and merge() >> amount to special cases that are generally simpler to >> use and more efficient when they apply. >> >> After a few stabs at it, I think that the only sane way >> to spec and default-implement it is to say that, even >> if your Map implementation otherwise allows null keys/values, >> that this method will act as if it doesn't. > > Since people seem not to mind this (and Brian's and Tim's > responses on other thread seem to even encourage it), the > set of all four are below. Please skim through and > see if you still agree. > > Brian: We had discussed also defining and default-implementing > in Map the other ConcurrentMap methods: > > V putIfAbsent(K key, V value); > boolean remove(Object key, Object value); > boolean replace(K key, V oldValue, V newValue); > V replace(K key, V value); > > Any thoughts? From brian.goetz at oracle.com Wed Dec 5 13:31:56 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 05 Dec 2012 16:31:56 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BFB835.6090707@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> Message-ID: <50BFBD4C.2010000@oracle.com> These look great to me! This default method thing was a pretty good idea! :) Nit: uses Scala fat arrow instead of Java thin arrow in examples. Bikeshed: The name "replaceIfPresent" seems nicer than "computeIfPresent" and fits in with existing replace naming. For the value-oriented CHM methods (putIfAbsent, remove, replace), I think it is reasonable to move these up to Map with the obvious defaults, with no promises about atomicity, and adjust the ConcurrentMap spec to add a comment about "unlike the Map versions, these are atomic." The null rules get more ad-hoc for these, but can still be reasonably consistent with the lambda versions; NPE on any key being null. For replace(K,V,null), we can treat that as remove-if-present and replace(K,null) can be unconditional remove. putIfAbsent(k, null) should be a no-op. remove(key, null) becomes "remove if get() returns null", I guess. ObAndIWantAPony: I would also love to have Map.merge(otherMap, mergeLambda) whose default uses your merge below, but for which you could define a better override in Map implementations that offer a better than element-wise merge. On 12/5/2012 4:10 PM, Doug Lea wrote: > On 12/05/12 07:39, Doug Lea wrote: > >> ... method Map.compute(). Three other similar methods >> computeIfAbsent(), computeIfPresent(), and merge() >> amount to special cases that are generally simpler to >> use and more efficient when they apply. >> >> After a few stabs at it, I think that the only sane way >> to spec and default-implement it is to say that, even >> if your Map implementation otherwise allows null keys/values, >> that this method will act as if it doesn't. > > Since people seem not to mind this (and Brian's and Tim's > responses on other thread seem to even encourage it), the > set of all four are below. Please skim through and > see if you still agree. > > Brian: We had discussed also defining and default-implementing > in Map the other ConcurrentMap methods: > > V putIfAbsent(K key, V value); > boolean remove(Object key, Object value); > boolean replace(K key, V oldValue, V newValue); > V replace(K key, V value); > > Any thoughts? > > .... > > > Additions to interface Map (with actual default implementations > elided for now): > > > /** > * Attempts to compute a mapping given a key and its current > * mapped value (or {@code null} if there is no current > * mapping). The default implementation is equivalent to > * > *

 {@code
>       *   V value = remappingFunction.apply(key, map.get(key));
>       *   if (value != null)
>       *     map.put(key, value);
>       *   else
>       *     map.remove(key);
>       * }
> * > * If the function returns {@code null}, the mapping is removed > * (or remains absent if initially absent). If the function > * itself throws an (unchecked) exception, the exception is > * rethrown to its caller, and the current mapping is left > * unchanged. For example, to either create or append new > * messages to a value mapping: > * > *
 {@code
>       * Map map = ...;
>       * final String msg = ...;
>       * map.compute(key, (k, v, msg) => (v == null) ? msg : v + msg)
> * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method. Any > * class overriding this method must specify its concurrency > * properties. > * > * @param key key with which the specified value is to be associated > * @param remappingFunction the function to compute a value > * @return the new value associated with the specified key, or null > if none > * @throws NullPointerException if the specified key is null and > * this map does not support null keys, or the > * remappingFunction is null > * @throws RuntimeException or Error if the remappingFunction does so, > * in which case the mapping is unchanged > */ > V compute(K key, > BiFunction > remappingFunction); > > /** > * If the specified key is not already associated with a value, > * attempts to compute its value using the given mappingFunction > * and enters it into the map unless null. This is equivalent to: > * > *

 {@code
>       * if (map.containsKey(key))
>       *   return map.get(key);
>       * value = mappingFunction.apply(key);
>       * if (value != null)
>       *   map.put(key, value);
>       * return value;}
> * > * If the function returns {@code null} no mapping is recorded. If > * the function itself throws an (unchecked) exception, the > * exception is rethrown to its caller, and no mapping is > * recorded. The most common usage is to construct a new object > * serving as an initial mapped value, or memoized result, as in: > * > *
 {@code
>       * map.computeIfAbsent(key, k -> new Value(f(k)));} 
> * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method. Any > * class overriding this method must specify its concurrency > * properties. > * > * @param key key with which the specified value is to be associated > * @param mappingFunction the function to compute a value > * @return the current (existing or computed) value associated with > * the specified key, or null if the computed value is null > * @throws NullPointerException if the specified key is null and > * this map does not support null keys, or the > * mappingFunction is null > * @throws RuntimeException or Error if the mappingFunction does so, > * in which case the mapping is left unestablished > */ > V computeIfAbsent(K key, Function > mappingFunction); > > /** > * If the given key is present, attempts to compute a new mapping > * given the key and its current mapped value. This is equivalent > * to: > * > *

 {@code
>       *   if (map.containsKey(key)) {
>       *     value = remappingFunction.apply(key, map.get(key));
>       *     if (value != null)
>       *       map.put(key, value);
>       *     else
>       *       map.remove(key);
>       *   }
>       * }
> * > * If the function returns {@code null}, the mapping is removed > * (or remains absent if initially absent). If the function > * itself throws an (unchecked) exception, the exception is > * rethrown to its caller, and the current mapping is left > * unchanged. > * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method. Any > * class overriding this method must specify its concurrency > * properties. > * > * @param key key with which the specified value is to be associated > * @param remappingFunction the function to compute a value > * @return the new value associated with the specified key, or null > if none > * @throws NullPointerException if the specified key is null and > * this map does not support null keys, or the > * remappingFunction is null > * @throws RuntimeException or Error if the remappingFunction does so, > * in which case the mapping is unchanged > */ > computeIfPresent(K key, > BiFunction > remappingFunction); > > /** > * If the specified key is not already associated with a value, > * associates it with the given value. Otherwise, replaces the > * value with the results of the given remapping function, or > * removes if {@code null}. This is equivalent to: > * > *

 {@code
>       *   V newValue;
>       *   if (!map.containsKey(key))
>       *     newValue = value;
>       *   else
>       *     newValue = remappingFunction.apply(map.get(key), value);
>       *   if (newValue != null)
>       *     map.put(key, newValue);
>       *   else
>       *     map.remove(key);
>       * }
> * > * If the function returns {@code null}, the mapping is removed > * (or remains absent if initially absent). If the function > * itself throws an (unchecked) exception, the exception is > * rethrown to its caller, and the current mapping is left > * unchanged. > * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method. Any > * class overriding this method must specify its concurrency > * properties. > * > * @param key key with which the specified value is to be associated > * @param value the value to use if absent > * @param remappingFunction the function to recompute a value if > present > * @return the new value associated with the specified key, or null > if none > * @throws NullPointerException if the specified key is null and > * this map does not support null keys, or the > * remappingFunction is null > * @throws RuntimeException or Error if the remappingFunction does so, > * in which case the mapping is unchanged > */ > V merge(K key, V value, > BiFunction > remappingFunction); > > > From dl at cs.oswego.edu Wed Dec 5 13:33:04 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 05 Dec 2012 16:33:04 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> Message-ID: <50BFBD90.4030701@cs.oswego.edu> On 12/05/12 16:25, Mike Duigou wrote: > The problem is that you can't write an atomic putIfAbsent default method in terms of the existing Map API. Thus far we've only contemplated defaults that can match any atomicity expectations provided by the non-default methods. > Right. The idea is that ALL of these would have the same disclaimer as the new lambda-friendly ones I listed: * *

The default implementation makes no guarantees about * synchronization or atomicity properties of this method. Any * class overriding this method must specify its concurrency * properties. * For CHM and related classes, you need each of the original four ConcurrentMap methods, and the four new functional ones because without them there's no way to get atomicity in these contexts. But some people like them for the sake of encapsulating common forms under standard names even if not atomic. I don't have a strong opinion about it, which is why I asked. -Doug From brian.goetz at oracle.com Wed Dec 5 13:46:21 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 05 Dec 2012 16:46:21 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BFBD90.4030701@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> Message-ID: <50BFC0AD.405@oracle.com> I'm fine with this. The existing Map API doesn't say anything about atomicity anywhere. The new putIfAbsent on Map wouldn't promise atomicity either; it is no worse than writing if (map.containsKey(k)) map.remove(k) which we do all the time when we want to do putIfAbsent against non-thread-safe maps. And the maps that promise atomicity can do better while using the same client idioms. The best example of why we want these on Map is computeIfAbsent. I could imagine that Guava never would have bothered with Multimap if computeIfAbsent were more convenient. On 12/5/2012 4:33 PM, Doug Lea wrote: > On 12/05/12 16:25, Mike Duigou wrote: >> The problem is that you can't write an atomic putIfAbsent default >> method in terms of the existing Map API. Thus far we've only >> contemplated defaults that can match any atomicity expectations >> provided by the non-default methods. >> > > Right. The idea is that ALL of these would have the same disclaimer as the > new lambda-friendly ones I listed: > > * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method. Any > * class overriding this method must specify its concurrency > * properties. > * > > For CHM and related classes, you need each of the original > four ConcurrentMap methods, and the four new functional ones > because without them there's no way to get atomicity > in these contexts. > But some people like them for the sake of encapsulating > common forms under standard names even if not atomic. > I don't have a strong opinion about it, which is why I asked. > > -Doug > From dl at cs.oswego.edu Wed Dec 5 13:50:35 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 05 Dec 2012 16:50:35 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BFBD4C.2010000@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD4C.2010000@oracle.com> Message-ID: <50BFC1AB.3050207@cs.oswego.edu> On 12/05/12 16:31, Brian Goetz wrote: > These look great to me! This default method thing was a pretty good idea! :) > > Nit: uses Scala fat arrow instead of Java thin arrow in examples. (Oops! A byproduct of talking about Scala in a course here yesterday :-) > > Bikeshed: The name "replaceIfPresent" seems nicer than "computeIfPresent" and > fits in with existing replace naming. OK; none of the names of it are beautiful, but either the analog-of-computeIfAbsent of of replace fine. (I at first had this as just "replace", but it can clash with the other form of replace in weird cases where you have a map of keys to functions.) > > For the value-oriented CHM methods (putIfAbsent, remove, replace), I think it is > reasonable to move these up to Map with the obvious defaults, with no promises > about atomicity, and adjust the ConcurrentMap spec to add a comment about > "unlike the Map versions, these are atomic." The null rules get more ad-hoc for > these, but can still be reasonably consistent with the lambda versions; NPE on > any key being null. For replace(K,V,null), we can treat that as > remove-if-present and replace(K,null) can be unconditional remove. > putIfAbsent(k, null) should be a no-op. remove(key, null) becomes "remove if > get() returns null", I guess. I agree. Changing replace(K,null) would be a CHM spec change (now throws NPE) but an innocuous one since it makes an illegal case legal, and is good for sake of consistency. I'll flesh these out and resend sometime soon. > > ObAndIWantAPony: I would also love to have Map.merge(otherMap, mergeLambda) > whose default uses your merge below, but for which you could define a better > override in Map implementations that offer a better than element-wise merge. > Well, you have the pony, but it's not in your favorite color: It's easy via the CHM bulk ops. (And was once a listed example but somehow I clobbered it. I'll put it back in.) It's not in any interface now that we don't have MapStreams. And probably not a great idea to add at Map level in case the need for something like MapStream arises again. -Doug From joe.bowbeer at gmail.com Wed Dec 5 15:06:15 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Wed, 5 Dec 2012 15:06:15 -0800 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BFC1AB.3050207@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD4C.2010000@oracle.com> <50BFC1AB.3050207@cs.oswego.edu> Message-ID: I slightly prefer computeIfPresent. replaceIfPresent sounds redundant and its name does not imply computation. I'd like all the compute* methods to be listed together. > Bikeshed: The name "replaceIfPresent" seems nicer than "computeIfPresent" > and > fits in with existing replace naming. > >> OK; none of the names of it are beautiful, but >> either the analog-of-computeIfAbsent of of replace fine. >> (I at first had this as just "replace", >> but it can clash with the other form of replace in >> weird cases where you have a map of keys to functions.) > > > On Wed, Dec 5, 2012 at 1:50 PM, Doug Lea

wrote: > On 12/05/12 16:31, Brian Goetz wrote: > >> These look great to me! This default method thing was a pretty good >> idea! :) >> >> Nit: uses Scala fat arrow instead of Java thin arrow in examples. >> > > (Oops! A byproduct of talking about Scala in a course here yesterday :-) > > > >> Bikeshed: The name "replaceIfPresent" seems nicer than "computeIfPresent" >> and >> fits in with existing replace naming. >> > > OK; none of the names of it are beautiful, but > either the analog-of-computeIfAbsent of of replace fine. > (I at first had this as just "replace", > but it can clash with the other form of replace in > weird cases where you have a map of keys to functions.) > > > > >> For the value-oriented CHM methods (putIfAbsent, remove, replace), I >> think it is >> reasonable to move these up to Map with the obvious defaults, with no >> promises >> about atomicity, and adjust the ConcurrentMap spec to add a comment about >> "unlike the Map versions, these are atomic." The null rules get more >> ad-hoc for >> these, but can still be reasonably consistent with the lambda versions; >> NPE on >> any key being null. For replace(K,V,null), we can treat that as >> remove-if-present and replace(K,null) can be unconditional remove. >> putIfAbsent(k, null) should be a no-op. remove(key, null) becomes >> "remove if >> get() returns null", I guess. >> > > I agree. Changing replace(K,null) would be a CHM spec change > (now throws NPE) but an innocuous one since it makes an illegal > case legal, and is good for sake of consistency. > > I'll flesh these out and resend sometime soon. > > > >> ObAndIWantAPony: I would also love to have Map.merge(otherMap, >> mergeLambda) >> whose default uses your merge below, but for which you could define a >> better >> override in Map implementations that offer a better than element-wise >> merge. >> >> > Well, you have the pony, but it's not in your favorite color: > It's easy via the CHM bulk ops. (And was once a listed example > but somehow I clobbered it. I'll put it back in.) > It's not in any interface now that we don't have MapStreams. > And probably not a great idea to add at Map level in case > the need for something like MapStream arises again. > > > > -Doug > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121205/6ef1fa34/attachment.html From forax at univ-mlv.fr Wed Dec 5 15:05:53 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 06 Dec 2012 00:05:53 +0100 Subject: Remove cumulate from Stream interface In-Reply-To: <50BF98B0.3070401@oracle.com> References: <50BF5B1E.5060600@univ-mlv.fr> <50BF62B4.7080406@oracle.com> <50BF96C4.1020406@univ-mlv.fr> <50BF98B0.3070401@oracle.com> Message-ID: <50BFD351.60303@univ-mlv.fr> On 12/05/2012 07:55 PM, Brian Goetz wrote: >>> Plus, your Mapper will violate the to-be-written specs about >>> statefulness/side-effects in lambdas passed to functional stream >>> methods. >> >> Do you really want this overly restrictive wording for streams that are >> sequential ? >> I find this unrealistic, even if you try to specify this in the doc, >> nobody read the doc if not forced. >> Given that will not be enforced in the code, it will be only true on the >> paper. > > It may be unrealistic, but we have to do it anyway. If someone passes > a Function to map() that mutates the collection source, all bets are > going to be off, and we have to say this. yes, but mutating the collection source is not the same as saying no side effect. Moreover, i have trouble to understand why the old rules can't be applied for serial stream. If the collection is a concurrent collection, there is no problem, . If the collection is not concurrent, it will fail-fast. For parallel stream, if the lambda have side effects, i agree that there is no guarantee. > The spec can characterize when things are guaranteed to work; if > people do things that accidentally work because they are operating in > a restrictive environment, that's their concern. > that why I think that sequential stream should always work or get a fail-fast exception and parallel stream should only work for lambda with no side effect. R?mi From joe.bowbeer at gmail.com Wed Dec 5 15:11:37 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Wed, 5 Dec 2012 15:11:37 -0800 Subject: Remove cumulate from Stream interface In-Reply-To: <50BFD351.60303@univ-mlv.fr> References: <50BF5B1E.5060600@univ-mlv.fr> <50BF62B4.7080406@oracle.com> <50BF96C4.1020406@univ-mlv.fr> <50BF98B0.3070401@oracle.com> <50BFD351.60303@univ-mlv.fr> Message-ID: I think I agree with R?mi. Programmers should not have to think twice (and will not) about converting their for-loops into forEach forms. However, they should think twice before adding "parallel()". On Wed, Dec 5, 2012 at 3:05 PM, Remi Forax wrote: > On 12/05/2012 07:55 PM, Brian Goetz wrote: > >> Plus, your Mapper will violate the to-be-written specs about >>>> statefulness/side-effects in lambdas passed to functional stream >>>> methods. >>>> >>> >>> Do you really want this overly restrictive wording for streams that are >>> sequential ? >>> I find this unrealistic, even if you try to specify this in the doc, >>> nobody read the doc if not forced. >>> Given that will not be enforced in the code, it will be only true on the >>> paper. >>> >> >> It may be unrealistic, but we have to do it anyway. If someone passes a >> Function to map() that mutates the collection source, all bets are going to >> be off, and we have to say this. >> > > yes, but mutating the collection source is not the same as saying no side > effect. > Moreover, i have trouble to understand why the old rules can't be applied > for serial stream. If the collection is a concurrent collection, there is > no problem, . If the collection is not concurrent, it will fail-fast. > > For parallel stream, if the lambda have side effects, i agree that there > is no guarantee. > > > The spec can characterize when things are guaranteed to work; if people >> do things that accidentally work because they are operating in a >> restrictive environment, that's their concern. >> >> > that why I think that sequential stream should always work or get a > fail-fast exception and parallel stream should only work for lambda with no > side effect. > > R?mi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121205/fec3bd67/attachment.html From david.holmes at oracle.com Wed Dec 5 22:06:45 2012 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Dec 2012 16:06:45 +1000 Subject: Request for Review : CR#8004015 : [final (?) pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> Message-ID: <50C035F5.4050007@oracle.com> On 6/12/2012 4:20 AM, Mike Duigou wrote: > I have updated webrev again to fix some reported javadoc technical issues and added null handling specification to the {Int|Double|Long}Supplier. > > http://cr.openjdk.java.net/~mduigou/8004015/2/webrev/ > http://cr.openjdk.java.net/~mduigou/8004015/2/specdiff/java/util/function/package-summary.html > > I believe that this iteration is complete (or very nearly so). Sorry to be a pain but this: left - the left operand, must be non-null doesn't tell you what happens if it is null. Is it not better to simply have: @param left the left operand @param right the right operand @throws NullPointerException if either left or right are null ? David ----- > Mike > > On Dec 4 2012, at 21:47 , Mike Duigou wrote: > >> Hello all; >> >> I have updated the proposed patch. The changes primarily add class and method documentation regarding handling of null for the primitive specializations. >> >> http://cr.openjdk.java.net/~mduigou/8004015/1/webrev/ >> http://cr.openjdk.java.net/~mduigou/8004015/1/specdiff/java/util/function/package-summary.html >> >> I've also reformatted the source for the default methods. >> >> Mike >> >> >> On Nov 26 2012, at 18:12 , Mike Duigou wrote: >> >>> Hello all; >>> >>> In the original patch which added the basic lambda functional interfaces, CR#8001634 [1], none of the interfaces extended other interfaces. The reason was primarily that the javac compiler did not, at the time that 8001634 was proposed, support extension methods. The compiler now supports adding of method defaults so this patch improves the functional interfaces by filing in the inheritance hierarchy. >>> >>> Adding the parent interfaces and default methods allows each functional interface to be used in more places. It is especially important for the functional interfaces which support primitive types, IntSupplier, IntFunction, IntUnaryOperator, IntBinaryOperator, etc. We expect that eventually standard implementations of these interfaces will be provided for functions like max, min, sum, etc. By extending the reference oriented functional interfaces such as Function, the primitive implementations can be used with the boxed primitive types along with the primitive types for which they are defined. >>> >>> The patch to add parent interfaces and default methods can be found here: >>> >>> http://cr.openjdk.java.net/~mduigou/8004015/0/webrev/ >>> http://cr.openjdk.java.net/~mduigou/8004015/0/specdiff/java/util/function/package-summary.html >>> >>> Mike >>> >>> [1] http://hg.openjdk.java.net/jdk8/tl/jdk/rev/c2e80176a697 >> > > From dl at cs.oswego.edu Thu Dec 6 04:29:16 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 06 Dec 2012 07:29:16 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BFC1AB.3050207@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD4C.2010000@oracle.com> <50BFC1AB.3050207@cs.oswego.edu> Message-ID: <50C08F9C.3070806@cs.oswego.edu> On 12/05/12 16:50, Doug Lea wrote: > On 12/05/12 16:31, Brian Goetz wrote: >> Bikeshed: The name "replaceIfPresent" seems nicer than "computeIfPresent" and >> fits in with existing replace naming. > > OK; none of the names of it are beautiful, but > either the analog-of-computeIfAbsent of of replace fine. (This may be the most mangled sentence I have ever sent in an e-mail! The result of mis-pasting the javadocs with cursor in the middle of it and then over-undo'ing.) I'm back to agreeing with myself and Joe. "replaceIfPresent" is redundant-sounding. Better to keep the symmetry with computeIfAbsent. > > Changing replace(K,null) would be a CHM spec change > (now throws NPE) but an innocuous one . Actually, it would be a ConcurrentMap spec change, which is definitely crossing the line. It will take some further thought to see if there is a spec/wording that is both safe and useful. -Doug From david.holmes at oracle.com Thu Dec 6 04:47:14 2012 From: david.holmes at oracle.com (David Holmes) Date: Thu, 06 Dec 2012 22:47:14 +1000 Subject: Request for Review : CR#8004015 : [final (?) pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> <50C035F5.4050007@oracle.com> Message-ID: <50C093D2.3090909@oracle.com> Stephen, I believe that exceptions thrown should be identified using @throws - not implied. I think @param is for giving the basic description of a parameter not for explaining the semantics, or what different values of the parameter mean. YMMV David On 6/12/2012 8:23 PM, Stephen Colebourne wrote: > On 6 December 2012 06:06, David Holmes wrote: >> On 6/12/2012 4:20 AM, Mike Duigou wrote: >>> >>> I have updated webrev again to fix some reported javadoc technical issues >>> and added null handling specification to the {Int|Double|Long}Supplier. >>> >>> http://cr.openjdk.java.net/~mduigou/8004015/2/webrev/ >>> >>> http://cr.openjdk.java.net/~mduigou/8004015/2/specdiff/java/util/function/package-summary.html >>> >>> I believe that this iteration is complete (or very nearly so). >> >> Sorry to be a pain but this: >> >> left - the left operand, must be non-null >> >> doesn't tell you what happens if it is null. Is it not better to simply >> have: >> >> @param left the left operand >> @param right the right operand >> @throws NullPointerException if either left or right are null > > Whereas I use: > @param left the left operand, not null > @param right the right operand, not null > > There is an element of taste here. As I wrote up > http://blog.joda.org/2012/11/javadoc-coding-standards.html > Javadoc is read as source code as often as it is read as HTML. Thus, > not overly cluttering is important. > IMO, the @throws NPE is implied by the assertion of "not null" or > "must be non-null". > > More importantly, the use of @param scales better. For example, there > is often a case where null is treated as a default or special value. > The Javadoc would then look something like > > @param left the left operand, null treated as zero > @param right the right operand, null treated as zero > > This kind of information belongs with the @param, and for consistency > it is much better to also have the "not null" aspect on the @param as > well. (Everything together is easier for developers to parse) > > In summary, while I prefer my "not null" to Mike's "must be non-null", > what is proposed is fine, and better than your (David's) proposal. > > Stephen > From mike.duigou at oracle.com Thu Dec 6 07:56:09 2012 From: mike.duigou at oracle.com (Mike Duigou) Date: Thu, 6 Dec 2012 07:56:09 -0800 Subject: Request for Review : CR#8004015 : [final (?) pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: <50C093D2.3090909@oracle.com> References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> <50C035F5.4050007@oracle.com> <50C093D2.3090909@oracle.com> Message-ID: <7DC6E3DE-9C1A-4DC6-8B26-E385C7882F44@oracle.com> Something seems entirely out of balance regarding handling of null. If a methods says that it takes a reference type then why ever might it be assumed that null is permitted? It feels more than a bit like we are adding "no naked people" stickers to every building entrance and "do not insert fingers" to every electrical outlet. The "@throws NPE" are yet another layer of "Violators will be arrested" or "You will be electrocuted" stickers. It seems entirely wrongheaded to assume that null could be passed in place of a valid reference unless explicitly and categorically forbidden. Accepting null should be considered extraordinary and worthy of mention only when it occurs. Being rare it would also be a lot easier to document. So really, why mention null at all? Mike On Dec 6 2012, at 04:47 , David Holmes wrote: > Stephen, > > I believe that exceptions thrown should be identified using @throws - not implied. > > I think @param is for giving the basic description of a parameter not for explaining the semantics, or what different values of the parameter mean. > > YMMV > > David > > On 6/12/2012 8:23 PM, Stephen Colebourne wrote: >> On 6 December 2012 06:06, David Holmes wrote: >>> On 6/12/2012 4:20 AM, Mike Duigou wrote: >>>> >>>> I have updated webrev again to fix some reported javadoc technical issues >>>> and added null handling specification to the {Int|Double|Long}Supplier. >>>> >>>> http://cr.openjdk.java.net/~mduigou/8004015/2/webrev/ >>>> >>>> http://cr.openjdk.java.net/~mduigou/8004015/2/specdiff/java/util/function/package-summary.html >>>> >>>> I believe that this iteration is complete (or very nearly so). >>> >>> Sorry to be a pain but this: >>> >>> left - the left operand, must be non-null >>> >>> doesn't tell you what happens if it is null. Is it not better to simply >>> have: >>> >>> @param left the left operand >>> @param right the right operand >>> @throws NullPointerException if either left or right are null >> >> Whereas I use: >> @param left the left operand, not null >> @param right the right operand, not null >> >> There is an element of taste here. As I wrote up >> http://blog.joda.org/2012/11/javadoc-coding-standards.html >> Javadoc is read as source code as often as it is read as HTML. Thus, >> not overly cluttering is important. >> IMO, the @throws NPE is implied by the assertion of "not null" or >> "must be non-null". >> >> More importantly, the use of @param scales better. For example, there >> is often a case where null is treated as a default or special value. >> The Javadoc would then look something like >> >> @param left the left operand, null treated as zero >> @param right the right operand, null treated as zero >> >> This kind of information belongs with the @param, and for consistency >> it is much better to also have the "not null" aspect on the @param as >> well. (Everything together is easier for developers to parse) >> >> In summary, while I prefer my "not null" to Mike's "must be non-null", >> what is proposed is fine, and better than your (David's) proposal. >> >> Stephen >> From joe.bowbeer at gmail.com Thu Dec 6 08:18:16 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 6 Dec 2012 08:18:16 -0800 Subject: Request for Review : CR#8004015 : [final (?) pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: <7DC6E3DE-9C1A-4DC6-8B26-E385C7882F44@oracle.com> References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> <50C035F5.4050007@oracle.com> <50C093D2.3090909@oracle.com> <7DC6E3DE-9C1A-4DC6-8B26-E385C7882F44@oracle.com> Message-ID: The documentation for Collection.add is not a reasonable model to emulate? http://docs.oracle.com/javase/6/docs/api/java/util/Collection.html#add(E) It is written from the standpoint that nulls are OK, but notes that some implementations might barf, and lists NPE among the possible exceptions. I think that's the right approach for streams as well. In a wide range of popular APIs, there are lots of methods that return null, and it is these nulls that are the most likely to show up in functionally-constructed streams. Java programmers do have the notion that nulls might not be allowed everywhere, and will look to the javadoc for clarification. On Thu, Dec 6, 2012 at 7:56 AM, Mike Duigou wrote: > Something seems entirely out of balance regarding handling of null. If a > methods says that it takes a reference type then why ever might it be > assumed that null is permitted? > > It feels more than a bit like we are adding "no naked people" stickers to > every building entrance and "do not insert fingers" to every electrical > outlet. The "@throws NPE" are yet another layer of "Violators will be > arrested" or "You will be electrocuted" stickers. > > It seems entirely wrongheaded to assume that null could be passed in place > of a valid reference unless explicitly and categorically forbidden. > Accepting null should be considered extraordinary and worthy of mention > only when it occurs. Being rare it would also be a lot easier to document. > > So really, why mention null at all? > > Mike > > On Dec 6 2012, at 04:47 , David Holmes wrote: > > > Stephen, > > > > I believe that exceptions thrown should be identified using @throws - > not implied. > > > > I think @param is for giving the basic description of a parameter not > for explaining the semantics, or what different values of the parameter > mean. > > > > YMMV > > > > David > > > > On 6/12/2012 8:23 PM, Stephen Colebourne wrote: > >> On 6 December 2012 06:06, David Holmes wrote: > >>> On 6/12/2012 4:20 AM, Mike Duigou wrote: > >>>> > >>>> I have updated webrev again to fix some reported javadoc technical > issues > >>>> and added null handling specification to the > {Int|Double|Long}Supplier. > >>>> > >>>> http://cr.openjdk.java.net/~mduigou/8004015/2/webrev/ > >>>> > >>>> > http://cr.openjdk.java.net/~mduigou/8004015/2/specdiff/java/util/function/package-summary.html > >>>> > >>>> I believe that this iteration is complete (or very nearly so). > >>> > >>> Sorry to be a pain but this: > >>> > >>> left - the left operand, must be non-null > >>> > >>> doesn't tell you what happens if it is null. Is it not better to simply > >>> have: > >>> > >>> @param left the left operand > >>> @param right the right operand > >>> @throws NullPointerException if either left or right are null > >> > >> Whereas I use: > >> @param left the left operand, not null > >> @param right the right operand, not null > >> > >> There is an element of taste here. As I wrote up > >> http://blog.joda.org/2012/11/javadoc-coding-standards.html > >> Javadoc is read as source code as often as it is read as HTML. Thus, > >> not overly cluttering is important. > >> IMO, the @throws NPE is implied by the assertion of "not null" or > >> "must be non-null". > >> > >> More importantly, the use of @param scales better. For example, there > >> is often a case where null is treated as a default or special value. > >> The Javadoc would then look something like > >> > >> @param left the left operand, null treated as zero > >> @param right the right operand, null treated as zero > >> > >> This kind of information belongs with the @param, and for consistency > >> it is much better to also have the "not null" aspect on the @param as > >> well. (Everything together is easier for developers to parse) > >> > >> In summary, while I prefer my "not null" to Mike's "must be non-null", > >> what is proposed is fine, and better than your (David's) proposal. > >> > >> Stephen > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121206/2bebf356/attachment.html From dl at cs.oswego.edu Thu Dec 6 13:10:36 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 06 Dec 2012 16:10:36 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C08F9C.3070806@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD4C.2010000@oracle.com> <50BFC1AB.3050207@cs.oswego.edu> <50C08F9C.3070806@cs.oswego.edu> Message-ID: <50C109CC.8060505@cs.oswego.edu> On 12/06/12 07:29, Doug Lea wrote: > Actually, it would be a ConcurrentMap spec change, which > is definitely crossing the line. It will take some > further thought to see if there is a spec/wording that > is both safe and useful. > Not so hard. All of the "The default implementation is equivalent to:" specs and wordings are now completely compatible with those in ConcurrentMap (which could use a few tiny touchups someday to clarify this). This omits the what would have been a nice symmetry about replace(k, null) and computeIfPresent. The only interpretation that works is for replace in non-null-accepting maps to throw NPE here. Which is what the j.u.c Maps now do anyway. I put a viewable version of the javadocs (using JDK7 javadoc) at http://gee.cs.oswego.edu/dl/wwwtmp/apis/Map.html (With nested standin versions of Function interfaces decls for now.) Comments welcome. I suspect that the style/terminology here of documenting defaults by saying they are equivalent to code snippets will usually be the most informative/useful tactic. It seems to be more understandable than the AbstractCollection etc style of trying to say it all in words. -Doug From david.holmes at oracle.com Thu Dec 6 18:02:34 2012 From: david.holmes at oracle.com (David Holmes) Date: Fri, 07 Dec 2012 12:02:34 +1000 Subject: Request for Review : CR#8004015 : [final (?) pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> <50C035F5.4050007@oracle.com> <50C093D2.3090909@oracle.com> <7DC6E3DE-9C1A-4DC6-8B26-E385C7882F44@oracle.com> Message-ID: <50C14E3A.7080907@oracle.com> Joe, On 7/12/2012 2:18 AM, Joe Bowbeer wrote: > The documentation for Collection.add is not a reasonable model to emulate? > > http://docs.oracle.com/javase/6/docs/api/java/util/Collection.html#add(E) > > It is written from the standpoint that nulls are OK, but notes that some > implementations might barf, and lists NPE among the possible exceptions. The difference here is that the intent is for nulls to be prohibited - full stop. In other places "we" avoid the API clutter by making broad statements regarding null handling "Unless otherwise stated null parameters to method or constructors of this class will result in NullPointerException being thrown". In some places we even handle this at the package doc level. Perhaps we should do the same here? Otherwise, the "right way" IMHO to indicate this is via @throws, not additional commentary on @param. Again YMMV. David ----- > I think that's the right approach for streams as well. > > In a wide range of popular APIs, there are lots of methods that return > null, and it is these nulls that are the most likely to show up in > functionally-constructed streams. Java programmers do have the notion that > nulls might not be allowed everywhere, and will look to the javadoc for > clarification. > > > > On Thu, Dec 6, 2012 at 7:56 AM, Mike Duigou wrote: > >> Something seems entirely out of balance regarding handling of null. If a >> methods says that it takes a reference type then why ever might it be >> assumed that null is permitted? >> >> It feels more than a bit like we are adding "no naked people" stickers to >> every building entrance and "do not insert fingers" to every electrical >> outlet. The "@throws NPE" are yet another layer of "Violators will be >> arrested" or "You will be electrocuted" stickers. >> >> It seems entirely wrongheaded to assume that null could be passed in place >> of a valid reference unless explicitly and categorically forbidden. >> Accepting null should be considered extraordinary and worthy of mention >> only when it occurs. Being rare it would also be a lot easier to document. >> >> So really, why mention null at all? >> >> Mike >> >> On Dec 6 2012, at 04:47 , David Holmes wrote: >> >>> Stephen, >>> >>> I believe that exceptions thrown should be identified using @throws - >> not implied. >>> >>> I think @param is for giving the basic description of a parameter not >> for explaining the semantics, or what different values of the parameter >> mean. >>> >>> YMMV >>> >>> David >>> >>> On 6/12/2012 8:23 PM, Stephen Colebourne wrote: >>>> On 6 December 2012 06:06, David Holmes wrote: >>>>> On 6/12/2012 4:20 AM, Mike Duigou wrote: >>>>>> >>>>>> I have updated webrev again to fix some reported javadoc technical >> issues >>>>>> and added null handling specification to the >> {Int|Double|Long}Supplier. >>>>>> >>>>>> http://cr.openjdk.java.net/~mduigou/8004015/2/webrev/ >>>>>> >>>>>> >> http://cr.openjdk.java.net/~mduigou/8004015/2/specdiff/java/util/function/package-summary.html >>>>>> >>>>>> I believe that this iteration is complete (or very nearly so). >>>>> >>>>> Sorry to be a pain but this: >>>>> >>>>> left - the left operand, must be non-null >>>>> >>>>> doesn't tell you what happens if it is null. Is it not better to simply >>>>> have: >>>>> >>>>> @param left the left operand >>>>> @param right the right operand >>>>> @throws NullPointerException if either left or right are null >>>> >>>> Whereas I use: >>>> @param left the left operand, not null >>>> @param right the right operand, not null >>>> >>>> There is an element of taste here. As I wrote up >>>> http://blog.joda.org/2012/11/javadoc-coding-standards.html >>>> Javadoc is read as source code as often as it is read as HTML. Thus, >>>> not overly cluttering is important. >>>> IMO, the @throws NPE is implied by the assertion of "not null" or >>>> "must be non-null". >>>> >>>> More importantly, the use of @param scales better. For example, there >>>> is often a case where null is treated as a default or special value. >>>> The Javadoc would then look something like >>>> >>>> @param left the left operand, null treated as zero >>>> @param right the right operand, null treated as zero >>>> >>>> This kind of information belongs with the @param, and for consistency >>>> it is much better to also have the "not null" aspect on the @param as >>>> well. (Everything together is easier for developers to parse) >>>> >>>> In summary, while I prefer my "not null" to Mike's "must be non-null", >>>> what is proposed is fine, and better than your (David's) proposal. >>>> >>>> Stephen >>>> >> >> > From joe.bowbeer at gmail.com Thu Dec 6 19:45:48 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 6 Dec 2012 19:45:48 -0800 Subject: Request for Review : CR#8004015 : [final (?) pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: <50C14E3A.7080907@oracle.com> References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> <50C035F5.4050007@oracle.com> <50C093D2.3090909@oracle.com> <7DC6E3DE-9C1A-4DC6-8B26-E385C7882F44@oracle.com> <50C14E3A.7080907@oracle.com> Message-ID: Ah, yes, I misunderstood the intent. Sorry. I agree that the null wording should be minimized, esp. in the parameter descriptions. The @throws NPE descriptions make it clear in every occurrence that nulls are prevented, which I like in this case, but do not distract the casual reader. On Thu, Dec 6, 2012 at 6:02 PM, David Holmes wrote: > Joe, > > > On 7/12/2012 2:18 AM, Joe Bowbeer wrote: > >> The documentation for Collection.add is not a reasonable model to emulate? >> >> http://docs.oracle.com/javase/**6/docs/api/java/util/** >> Collection.html#add(E) >> >> It is written from the standpoint that nulls are OK, but notes that some >> implementations might barf, and lists NPE among the possible exceptions. >> > > The difference here is that the intent is for nulls to be prohibited - > full stop. > > In other places "we" avoid the API clutter by making broad statements > regarding null handling "Unless otherwise stated null parameters to method > or constructors of this class will result in NullPointerException being > thrown". In some places we even handle this at the package doc level. > Perhaps we should do the same here? > > Otherwise, the "right way" IMHO to indicate this is via @throws, not > additional commentary on @param. > > Again YMMV. > > David > ----- > > > I think that's the right approach for streams as well. >> >> In a wide range of popular APIs, there are lots of methods that return >> null, and it is these nulls that are the most likely to show up in >> functionally-constructed streams. Java programmers do have the notion >> that >> nulls might not be allowed everywhere, and will look to the javadoc for >> clarification. >> >> >> >> On Thu, Dec 6, 2012 at 7:56 AM, Mike Duigou >> wrote: >> >> Something seems entirely out of balance regarding handling of null. If a >>> methods says that it takes a reference type then why ever might it be >>> assumed that null is permitted? >>> >>> It feels more than a bit like we are adding "no naked people" stickers to >>> every building entrance and "do not insert fingers" to every electrical >>> outlet. The "@throws NPE" are yet another layer of "Violators will be >>> arrested" or "You will be electrocuted" stickers. >>> >>> It seems entirely wrongheaded to assume that null could be passed in >>> place >>> of a valid reference unless explicitly and categorically forbidden. >>> Accepting null should be considered extraordinary and worthy of mention >>> only when it occurs. Being rare it would also be a lot easier to >>> document. >>> >>> So really, why mention null at all? >>> >>> Mike >>> >>> On Dec 6 2012, at 04:47 , David Holmes wrote: >>> >>> Stephen, >>>> >>>> I believe that exceptions thrown should be identified using @throws - >>>> >>> not implied. >>> >>>> >>>> I think @param is for giving the basic description of a parameter not >>>> >>> for explaining the semantics, or what different values of the parameter >>> mean. >>> >>>> >>>> YMMV >>>> >>>> David >>>> >>>> On 6/12/2012 8:23 PM, Stephen Colebourne wrote: >>>> >>>>> On 6 December 2012 06:06, David Holmes >>>>> wrote: >>>>> >>>>>> On 6/12/2012 4:20 AM, Mike Duigou wrote: >>>>>> >>>>>>> >>>>>>> I have updated webrev again to fix some reported javadoc technical >>>>>>> >>>>>> issues >>> >>>> and added null handling specification to the >>>>>>> >>>>>> {Int|Double|Long}Supplier. >>> >>>> >>>>>>> http://cr.openjdk.java.net/~**mduigou/8004015/2/webrev/ >>>>>>> >>>>>>> >>>>>>> http://cr.openjdk.java.net/~**mduigou/8004015/2/specdiff/** >>> java/util/function/package-**summary.html >>> >>>> >>>>>>> I believe that this iteration is complete (or very nearly so). >>>>>>> >>>>>> >>>>>> Sorry to be a pain but this: >>>>>> >>>>>> left - the left operand, must be non-null >>>>>> >>>>>> doesn't tell you what happens if it is null. Is it not better to >>>>>> simply >>>>>> have: >>>>>> >>>>>> @param left the left operand >>>>>> @param right the right operand >>>>>> @throws NullPointerException if either left or right are null >>>>>> >>>>> >>>>> Whereas I use: >>>>> @param left the left operand, not null >>>>> @param right the right operand, not null >>>>> >>>>> There is an element of taste here. As I wrote up >>>>> http://blog.joda.org/2012/11/**javadoc-coding-standards.html >>>>> Javadoc is read as source code as often as it is read as HTML. Thus, >>>>> not overly cluttering is important. >>>>> IMO, the @throws NPE is implied by the assertion of "not null" or >>>>> "must be non-null". >>>>> >>>>> More importantly, the use of @param scales better. For example, there >>>>> is often a case where null is treated as a default or special value. >>>>> The Javadoc would then look something like >>>>> >>>>> @param left the left operand, null treated as zero >>>>> @param right the right operand, null treated as zero >>>>> >>>>> This kind of information belongs with the @param, and for consistency >>>>> it is much better to also have the "not null" aspect on the @param as >>>>> well. (Everything together is easier for developers to parse) >>>>> >>>>> In summary, while I prefer my "not null" to Mike's "must be non-null", >>>>> what is proposed is fine, and better than your (David's) proposal. >>>>> >>>>> Stephen >>>>> >>>>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121206/5efc65f0/attachment.html From david.holmes at oracle.com Thu Dec 6 23:18:24 2012 From: david.holmes at oracle.com (David Holmes) Date: Fri, 07 Dec 2012 17:18:24 +1000 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50BFBD90.4030701@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> Message-ID: <50C19840.6070909@oracle.com> On 6/12/2012 7:33 AM, Doug Lea wrote: > On 12/05/12 16:25, Mike Duigou wrote: >> The problem is that you can't write an atomic putIfAbsent default >> method in terms of the existing Map API. Thus far we've only >> contemplated defaults that can match any atomicity expectations >> provided by the non-default methods. >> > > Right. The idea is that ALL of these would have the same disclaimer as the > new lambda-friendly ones I listed: > > * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method. Any > * class overriding this method must specify its concurrency > * properties. > * Which means that ConcurrentMap has to re-abstract all the new methods as their default implementations are invalid for maps that people expect atomic operations from. David ----- > For CHM and related classes, you need each of the original > four ConcurrentMap methods, and the four new functional ones > because without them there's no way to get atomicity > in these contexts. > But some people like them for the sake of encapsulating > common forms under standard names even if not atomic. > I don't have a strong opinion about it, which is why I asked. > > -Doug > From dl at cs.oswego.edu Fri Dec 7 04:58:39 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Dec 2012 07:58:39 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C19840.6070909@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> Message-ID: <50C1E7FF.8040307@cs.oswego.edu> On 12/07/12 02:18, David Holmes wrote: >> Right. The idea is that ALL of these would have the same disclaimer as the >> new lambda-friendly ones I listed: >> >> * >> *

The default implementation makes no guarantees about >> * synchronization or atomicity properties of this method. Any >> * class overriding this method must specify its concurrency >> * properties. >> * > > Which means that ConcurrentMap has to re-abstract all the new methods as their > default implementations are invalid for maps that people expect atomic > operations from. Yes/no. There is some risk in doing this in the way I posted for both the four existing ConcurrentMap and four added functional-form methods: A concrete concurrent class will no longer be forced by a compiler to override to provide atomicity. And there is of course no way we can do it for them. But users will naturally expect otherwise. This is why I've been fleshing these all out, so people could help decide if this is the best move. I'm still thinking barely-yes, but am a little nervous about it. -Doug From david.holmes at oracle.com Fri Dec 7 05:04:46 2012 From: david.holmes at oracle.com (David Holmes) Date: Fri, 07 Dec 2012 23:04:46 +1000 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C1E7FF.8040307@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> Message-ID: <50C1E96E.5020507@oracle.com> On 7/12/2012 10:58 PM, Doug Lea wrote: > On 12/07/12 02:18, David Holmes wrote: > >>> Right. The idea is that ALL of these would have the same disclaimer >>> as the >>> new lambda-friendly ones I listed: >>> >>> * >>> *

The default implementation makes no guarantees about >>> * synchronization or atomicity properties of this method. Any >>> * class overriding this method must specify its concurrency >>> * properties. >>> * >> >> Which means that ConcurrentMap has to re-abstract all the new methods >> as their >> default implementations are invalid for maps that people expect atomic >> operations from. > > Yes/no. There is some risk in doing this in the way > I posted for both the four existing ConcurrentMap and four > added functional-form methods: > A concrete concurrent class will no longer be > forced by a compiler to override to provide atomicity. > And there is of course no way we can do it for them. > But users will naturally expect otherwise. > This is why I've been fleshing these all out, > so people could help decide if this is the best move. > I'm still thinking barely-yes, but am a little nervous > about it. I'm not quite sure how to interpret your "barely-yes". :) But this is the prime motivating example I had for requiring re-abstraction. Another alternative would be to have ConcurrentMap redefine defaults that throw UnsupportedOperationException. But I don't think that is a good model to have. If we re-abstract then existing implementations of ConcurrentMap (outside the JDK) will not recompile without providing these new methods; and if the existing classes are used and the new methods invoked then they will get AbstractMethodError. That seems reasonable to me. David > -Doug > From dl at cs.oswego.edu Fri Dec 7 05:14:58 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Dec 2012 08:14:58 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C1E96E.5020507@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1E96E.5020507@oracle.com> Message-ID: <50C1EBD2.20608@cs.oswego.edu> On 12/07/12 08:04, David Holmes wrote: > I'm not quite sure how to interpret your "barely-yes". > What I was mainly getting at is: Should the cases of existing ConcurrentMap methods and the new function-accepting methods work differently? Suppose someone calls computeIfAbsent on a JDK7-compliant (but non-JDK) ConcurrentMap that doesn't have an explicit override. They would get the non-atomic version. And this is considered OK according to the the specs as I wrote them. But still surprising to at least some users. -Doug From brian.goetz at oracle.com Fri Dec 7 05:46:44 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 07 Dec 2012 08:46:44 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C1E7FF.8040307@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> Message-ID: <50C1F344.3020905@oracle.com> I think David's strategy of reabstracting in ConcurrentMap makes sense. CM changes the semantics of these methods compared to what's in Map. As it turns out, this requires no code change because the existing declaration in CM will act as a reabstraction. On 12/7/2012 7:58 AM, Doug Lea wrote: > On 12/07/12 02:18, David Holmes wrote: > >>> Right. The idea is that ALL of these would have the same disclaimer >>> as the >>> new lambda-friendly ones I listed: >>> >>> * >>> *

The default implementation makes no guarantees about >>> * synchronization or atomicity properties of this method. Any >>> * class overriding this method must specify its concurrency >>> * properties. >>> * >> >> Which means that ConcurrentMap has to re-abstract all the new methods >> as their >> default implementations are invalid for maps that people expect atomic >> operations from. > > Yes/no. There is some risk in doing this in the way > I posted for both the four existing ConcurrentMap and four > added functional-form methods: > A concrete concurrent class will no longer be > forced by a compiler to override to provide atomicity. > And there is of course no way we can do it for them. > But users will naturally expect otherwise. > This is why I've been fleshing these all out, > so people could help decide if this is the best move. > I'm still thinking barely-yes, but am a little nervous > about it. > > -Doug > From forax at univ-mlv.fr Fri Dec 7 05:55:04 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 07 Dec 2012 14:55:04 +0100 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C1F344.3020905@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> Message-ID: <50C1F538.2010006@univ-mlv.fr> On 12/07/2012 02:46 PM, Brian Goetz wrote: > I think David's strategy of reabstracting in ConcurrentMap makes > sense. CM changes the semantics of these methods compared to what's > in Map. As it turns out, this requires no code change because the > existing declaration in CM will act as a reabstraction. yes, it's better than providing a semantics that will stab users in the back. R?mi > > On 12/7/2012 7:58 AM, Doug Lea wrote: >> On 12/07/12 02:18, David Holmes wrote: >> >>>> Right. The idea is that ALL of these would have the same disclaimer >>>> as the >>>> new lambda-friendly ones I listed: >>>> >>>> * >>>> *

The default implementation makes no guarantees about >>>> * synchronization or atomicity properties of this method. Any >>>> * class overriding this method must specify its concurrency >>>> * properties. >>>> * >>> >>> Which means that ConcurrentMap has to re-abstract all the new methods >>> as their >>> default implementations are invalid for maps that people expect atomic >>> operations from. >> >> Yes/no. There is some risk in doing this in the way >> I posted for both the four existing ConcurrentMap and four >> added functional-form methods: >> A concrete concurrent class will no longer be >> forced by a compiler to override to provide atomicity. >> And there is of course no way we can do it for them. >> But users will naturally expect otherwise. >> This is why I've been fleshing these all out, >> so people could help decide if this is the best move. >> I'm still thinking barely-yes, but am a little nervous >> about it. >> >> -Doug >> From dl at cs.oswego.edu Fri Dec 7 06:10:57 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Dec 2012 09:10:57 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C1F538.2010006@univ-mlv.fr> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> Message-ID: <50C1F8F1.6050701@cs.oswego.edu> On 12/07/12 08:55, Remi Forax wrote: > On 12/07/2012 02:46 PM, Brian Goetz wrote: >> I think David's strategy of reabstracting in ConcurrentMap makes sense. CM >> changes the semantics of these methods compared to what's in Map. As it turns >> out, this requires no code change because the existing declaration in CM will >> act as a reabstraction. Yes, this is/was my intent. > > yes, it's better than providing a semantics that will stab users in the back. > Which works out OK for the CM methods. I'm comforted to see that people now implicitly share my anxiety about the new function-accepting methods. But that doesn't get me closer to a resolution :-) So, to re-ask: > Should the cases of existing ConcurrentMap methods and the new > function-accepting methods work differently? Suppose > someone calls computeIfAbsent on a JDK7-compliant > (but non-JDK) ConcurrentMap that doesn't have an explicit > override. They would get the non-atomic version. And this is > considered OK according to the the specs as I wrote them. > But still surprising to at least some users. From brian.goetz at oracle.com Fri Dec 7 06:22:17 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 07 Dec 2012 09:22:17 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C1F8F1.6050701@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> Message-ID: <50C1FB99.5060103@oracle.com> Let's back up. I don't understand the premise of the question. CM already has methods putIfAbsent and friends, which are currently abstract in CM, and will remain abstract in CM even though Map will acquire them with non-atomic defaults. No problem so far. CM will now acquire some NEW methods, taking lambdas. Like: computeIfAbsent(K, K->V) The following default implementation *is* atomic because putIfAbsent is already atomic in CM: computeIfAbsent(K key, K->V fn) { if (!containsKey(key)) putIfAbsent(key, fn.apply(key)); } What it doesn't do is guarantee that the generating function is called exactly once, or that the result of that generating function is not discarded. The better version implemented by CHM will have better properties -- that the function is called exactly once and the result is never discarded. But I don't see where the non-atomicity comes from? > So, to re-ask: > >> Should the cases of existing ConcurrentMap methods and the new >> function-accepting methods work differently? Suppose >> someone calls computeIfAbsent on a JDK7-compliant >> (but non-JDK) ConcurrentMap that doesn't have an explicit >> override. They would get the non-atomic version. And this is >> considered OK according to the the specs as I wrote them. >> But still surprising to at least some users. From dl at cs.oswego.edu Fri Dec 7 06:51:50 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Dec 2012 09:51:50 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C1FB99.5060103@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> Message-ID: <50C20286.90002@cs.oswego.edu> On 12/07/12 09:22, Brian Goetz wrote: > The following default implementation *is* atomic because putIfAbsent is already > atomic in CM: > > computeIfAbsent(K key, K->V fn) { > if (!containsKey(key)) > putIfAbsent(key, fn.apply(key)); > } > Thanks for prod to try recasting this wrt *scopes* of atomicity. Which, with some further re-work propagates up to plain Map versions as well without need for re-abstraction. I'll try it out and post an update. Basic idea: defaults for function-accepting Map methods are solely in terms of the 4 CM methods, which are in turn non-atomic for non-CM. But implementations can if desired/possible, further widen atomicity scope to include the function call. -Doug From dl at cs.oswego.edu Fri Dec 7 08:16:34 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Dec 2012 11:16:34 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C20286.90002@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> Message-ID: <50C21662.8050308@cs.oswego.edu> On 12/07/12 09:51, Doug Lea wrote: > Basic idea: defaults for function-accepting Map methods are solely > in terms of the 4 CM methods, which are in turn non-atomic for non-CM.... See update at: http://gee.cs.oswego.edu/dl/wwwtmp/apis/Map.html These probably now take more effort for users to understand, but otherwise seem pretty good to me. Any complaints? (I reordered some of the decls so that they would flow a little better for the 5% (if that) of users who ever read the Javadocs sequentially.) -Doug From forax at univ-mlv.fr Fri Dec 7 08:28:25 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 07 Dec 2012 17:28:25 +0100 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C21662.8050308@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> Message-ID: <50C21929.20204@univ-mlv.fr> On 12/07/2012 05:16 PM, Doug Lea wrote: > On 12/07/12 09:51, Doug Lea wrote: > >> Basic idea: defaults for function-accepting Map methods are solely >> in terms of the 4 CM methods, which are in turn non-atomic for >> non-CM.... > > See update at: > http://gee.cs.oswego.edu/dl/wwwtmp/apis/Map.html > > These probably now take more effort for users to understand, but > otherwise seem pretty good to me. > > Any complaints? I just don't like compute, the verb is too generic, update/updateValue is perhaps better. > > (I reordered some of the decls so that they would flow a little > better for the 5% (if that) of users who ever read the Javadocs > sequentially.) > > -Doug > R?mi From dl at cs.oswego.edu Fri Dec 7 08:36:33 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Dec 2012 11:36:33 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C21929.20204@univ-mlv.fr> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50C21929.20204@univ-mlv.fr> Message-ID: <50C21B11.30901@cs.oswego.edu> On 12/07/12 11:28, Remi Forax wrote: > I just don't like compute, the verb is too generic, update/updateValue is > perhaps better. I hated it too when Bob Lee et al lobbied for it, but enough years have gone by that I often forget that I hate it :-) Maybe this will happen to me some decade for "Block"... (The main reason for keeping "compute" is the Guava precedence.) -Doug From brian.goetz at oracle.com Fri Dec 7 08:50:49 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 07 Dec 2012 11:50:49 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C21662.8050308@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> Message-ID: <50C21E69.8080904@oracle.com> Extraneous {@code fragment in compute(). I think the semantics of compute can be explained better; people will likely confuse it with computeIfAbsent. There's an aspect of reduction going on here; the remapping function is like a reducer under the "null means nothing there" semantics. It would be great if the name could suggest this; barring that the docs should explain it. Only when users get to the "create or append" example will they get it. Seems like you didn't yet fold in the bits about scope of atomicity? Will that go in CM docs? What do you mean by "possibly iterative equivalent of"? On 12/7/2012 11:16 AM, Doug Lea wrote: > On 12/07/12 09:51, Doug Lea wrote: > >> Basic idea: defaults for function-accepting Map methods are solely >> in terms of the 4 CM methods, which are in turn non-atomic for non-CM.... > > See update at: > http://gee.cs.oswego.edu/dl/wwwtmp/apis/Map.html > > These probably now take more effort for users to understand, but > otherwise seem pretty good to me. > > Any complaints? > > (I reordered some of the decls so that they would flow a little > better for the 5% (if that) of users who ever read the Javadocs > sequentially.) > > -Doug > > > > From Donald.Raab at gs.com Fri Dec 7 09:34:51 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Fri, 7 Dec 2012 12:34:51 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C21B11.30901@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50C21929.20204@univ-mlv.fr> <50C21B11.30901@cs.oswego.edu> Message-ID: <6712820CB52CFB4D842561213A77C05404BE2033C0@GSCMAMP09EX.firmwide.corp.gs.com> > -----Original Message----- > From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda- > libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Doug Lea > Sent: Friday, December 07, 2012 11:37 AM > To: lambda-libs-spec-experts at openjdk.java.net > Subject: Re: ConcurrentHashMap/ConcurrentMap/Map.compute > > On 12/07/12 11:28, Remi Forax wrote: > > > I just don't like compute, the verb is too generic, > update/updateValue > > is perhaps better. > > I hated it too when Bob Lee et al lobbied for it, but enough years have > gone by that I often forget that I hate it :-) Maybe this will happen > to me some decade for "Block"... > > (The main reason for keeping "compute" is the Guava precedence.) > > -Doug > > We went with the equivalent names in Smalltalk's Dictionary class (surprise!). They're not quite as nice without keyword message support, but they are easy enough to understand IMO. /** * Get and return the value in the Map at the specified key. Alternatively, if there is no value in the map at the key, * return the result of evaluating the specified Function0, and put that value in the map at the specified key. */ V getIfAbsentPut(K key, Function0 function); /** * Get and return the value in the Map at the specified key. Alternatively, if there is no value in the map for that key * return the result of evaluating the specified Function using the specified parameter, and put that value in the * map at the specified key. */

V getIfAbsentPutWith(K key, Function function, P parameter); /** * Return the value in the Map that corresponds to the specified key, or if there is no value at the key, return the * result of evaluating the specified Function0. */ V getIfAbsent(K key, Function0 function); /** * Return the value in the Map that corresponds to the specified key, or if there is no value at the key, return {@code value}. */ V getIfAbsentValue(K key, V value); /** * Return the value in the Map that corresponds to the specified key, or if there is no value at the key, return the * result of evaluating the specified function and parameter. */

V getIfAbsentWith(K key, Function function, P parameter); /** * If there is a value in the Map that corresponds to the specified key return the result of applying the specified * Function on the value, otherwise return null. */ A ifPresentApply(K key, Function function); These are split between MutableMap and MapIterable (MutableMap's readable parent). https://github.com/goldmansachs/gs-collections/blob/master/collections-api/src/main/java/com/gs/collections/api/map/MutableMap.java https://github.com/goldmansachs/gs-collections/blob/master/collections-api/src/main/java/com/gs/collections/api/map/MapIterable.java We have no equivalent of if present do something and put the result back. Based on our naming scheme it would be something like ifPresentPut or ifPresentApplyPut. Does computeIfAbsent mutate the map or not? It is not clear from the name. I like explicit names. BTW, putIfAbsent is already a precedent as well. From dl at cs.oswego.edu Fri Dec 7 09:43:05 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Dec 2012 12:43:05 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C21E69.8080904@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50C21E69.8080904@oracle.com> Message-ID: <50C22AA9.7080106@cs.oswego.edu> On 12/07/12 11:50, Brian Goetz wrote: > Extraneous {@code fragment in compute(). Thanks! > > I think the semantics of compute can be explained better; people will likely > confuse it with computeIfAbsent. There's an aspect of reduction going on > here; the remapping function is like a reducer under the "null means nothing > there" semantics. It would be great if the name could suggest this; barring > that the docs should explain it. Only when users get to the "create or > append" example will they get it. Right: compute() and merge() are paired in this sense. (and computeIfPresent and computeIfAbsent paired in a different sense). So if we can think of a better name, please suggest one. No one did when I asked on c-i list last year for CHM. > > Seems like you didn't yet fold in the bits about scope of atomicity? Will > that go in CM docs? > > What do you mean by "possibly iterative equivalent of"? I told you that they are a little hard to understand :-) The "possibly iterative" is explained later by the sentence: "In concurrent contexts, if attempts to replace or remove fail, they may be retried after remapping updated values." And the scope is "explained" by adding the "or the application of the remapping function" in the atomicity disclaimer parag. One could /should go into a lot more detail about using functions in concurrent/parallel contexts, but I think that there will need to be some added package-level discussion of this somewhere anyway, that this can refer to. -Doug From dl at cs.oswego.edu Fri Dec 7 11:49:57 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Dec 2012 14:49:57 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <6712820CB52CFB4D842561213A77C05404BE2033C0@GSCMAMP09EX.firmwide.corp.gs.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50C21929.20204@univ-mlv.fr> <50C21B11.30901@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404BE2033C0@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50C24865.2050804@cs.oswego.edu> On 12/07/12 12:34, Raab, Donald wrote: > We went with the equivalent names in Smalltalk's Dictionary class (surprise!). They're not quite as nice without keyword message support, but they are easy enough to understand IMO. > >

V getIfAbsentPutWith(K key, Function function, P parameter); > I'd normally side with ugly-but-clear. But I think this is too far past merely ugly even for me :-) -Doug From Donald.Raab at gs.com Fri Dec 7 13:01:24 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Fri, 7 Dec 2012 16:01:24 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C24865.2050804@cs.oswego.edu> References: <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50C21929.20204@univ-mlv.fr> <50C21B11.30901@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404BE2033C0@GSCMAMP09EX.firmwide.corp.gs.com> <50C24865.2050804@cs.oswego.edu> Message-ID: <6712820CB52CFB4D842561213A77C05404BE20341A@GSCMAMP09EX.firmwide.corp.gs.com> > > > >

V getIfAbsentPutWith(K key, Function > > function, P parameter); > > > > I'd normally side with ugly-but-clear. But I think this is too far past > merely ugly even for me :-) > > -Doug The "With" methods are certainly ugly. But we use them consistently across GS Collections to create more opportunities for anonymous and inner class instances to be held in static variables. We also have selectWith, rejectWith, collectWith, etc. The "With" signals "ugly optimization here". If you want something optimized, you must not mind it being more ugly. :-) Here's a simple example of allSatisfyWith. public boolean containsAll(Collection source) { return Iterate.allSatisfyWith(source, Predicates2.in(), this); } The result of Predicates2.in() is a static instance of a two argument predicate. The predicate will be passed "each" and the reference to "this" which is the collection implementing containsAll. This could have been written as just allSatisfy as follows: public boolean containsAll(Collection source) { return Iterate.allSatisfy (source, Predicates.in(this)); } The difference here is that Predicates.in(this) has to create a new instance. We use the specific method you highlighted in our AbstractMutableMultimap class. public boolean put(K key, V value) { C collection = this.getIfAbsentPutCollection(key); if (collection.add(value)) { this.incrementTotalSize(); return true; } return false; } private C getIfAbsentPutCollection(K key) { return this.map.getIfAbsentPutWith(key, this.createCollectionBlock(), this); } The createCollectionBlock() will return the same static instance here for all multimaps. This means every call to put doesn't create a function unnecessarily. From mike.duigou at oracle.com Fri Dec 7 13:02:09 2012 From: mike.duigou at oracle.com (Mike Duigou) Date: Fri, 7 Dec 2012 13:02:09 -0800 Subject: Request for Review : CR#8004015 : [final (?) pass] Add interface extends and defaults for basic functional interfaces In-Reply-To: <50C088BE.1040403@oracle.com> References: <0CABE1EF-B971-43A0-ABB8-3EE3D82DC029@oracle.com> <50C035F5.4050007@oracle.com> <50C088BE.1040403@oracle.com> Message-ID: <2D5B0B63-59C9-4989-B8C7-D5A9D41FF7C0@oracle.com> On Dec 6 2012, at 03:59 , Chris Hegarty wrote: > Mike, > > Some small comments. > > 1) IntUnaryOperator.java > > Typo in: > + 30 *

This is the primitive type specialization of {@link IntUnaryOperator} for > + 31 * {@code int} and also may be used as a {@code IntUnaryOperator}. When > + 32 * used as a {@code IntUnaryOperator} the default {@code operate} implementation > + 33 * provided by this interface neither accepts null parameters nor does it return > + 34 * null results. > > IntUnaryOperator -> UnaryOperator Corrected. > > 2) Double/Int/Long Function > > "When used as a Function the default apply implementation provided > by this interface neither accepts null parameters nor does it > return null results." > > "When used as a Function", is this really necessary, or should the > behavior of the default apply method just be described? I agree that this is somewhat awkward. I will see if I can't think of something better. > Why the restriction on null parameters, it seems overly restrictive > since applyAsXXXX accepts nulls, right? Corrected. Thank you for noticing this. > 3) package description > > "null values are accepted and returned by these functional > interfaces according to the constraints of the specification in > which the functional interfaces are used. The functional interfaces > themselves do not constrain or mandate use of null values. Most > usages of the functional interfaces will define the role, if any, > of null for that context." > > Given these changes, doesn't this need to be reworked ( to indicate > that some methods specify null value behavior)? > > 4) Trivially, IntSupplier is missing a '

', before "This is the > primitive type..." > > 5) I agree with David, NPE should be defined where applicable. I am adding these though I am still somewhat resistant for reasons I will mention in next review cycle thread Mike From Donald.Raab at gs.com Fri Dec 7 14:52:38 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Fri, 7 Dec 2012 17:52:38 -0500 Subject: Updated our training kata to use the 12/4 Lambda Binaries Message-ID: <6712820CB52CFB4D842561213A77C05404BE203436@GSCMAMP09EX.firmwide.corp.gs.com> The latest binary required no changes for the GS Collections solutions branch which is nice. The following branch has the solutions written with the Java 8 Streams library changes. https://github.com/goldmansachs/gs-collections-kata/commits/solutions-java8-jcf You can view the diff changes here: https://github.com/goldmansachs/gs-collections-kata/commit/9c344e12e3f502c9f9f2961135a6b095207f9d88 The biggest surprise that required me to change some code was the result of groupBy not returning a List anymore. I had to wrap the result in a Bag to compare it to the expected results in the tests. There was also a problem I had where I was trying to reuse a stream, which I know is no longer allowed. Other than that it was mostly replace Mapper with Function, and replace functions with function in the package names. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121207/37d0310a/attachment.html From forax at univ-mlv.fr Sat Dec 8 07:45:43 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 08 Dec 2012 16:45:43 +0100 Subject: Combiner & BiFunction Message-ID: <50C360A7.1060306@univ-mlv.fr> I've just found that we have the very same functional interface twice, Combiner and BiFunction. R?mi From brian.goetz at oracle.com Sat Dec 8 07:52:13 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 10:52:13 -0500 Subject: Combiner & BiFunction In-Reply-To: <50C360A7.1060306@univ-mlv.fr> References: <50C360A7.1060306@univ-mlv.fr> Message-ID: <50C3622D.2030903@oracle.com> Yes, its on my list to rationalize these. On 12/8/2012 10:45 AM, Remi Forax wrote: > I've just found that we have the very same functional interface twice, > Combiner and BiFunction. > > R?mi > From forax at univ-mlv.fr Sat Dec 8 08:30:32 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 08 Dec 2012 17:30:32 +0100 Subject: Combiner & BiFunction In-Reply-To: <50C3622D.2030903@oracle.com> References: <50C360A7.1060306@univ-mlv.fr> <50C3622D.2030903@oracle.com> Message-ID: <50C36B28.7070107@univ-mlv.fr> On 12/08/2012 04:52 PM, Brian Goetz wrote: > Yes, its on my list to rationalize these. Can you also normalize the Function and Operator types name ? when it was Mapper instead of Function, it was not a big deal, but now with the name Function, Operator and Function names are not aligned. Function -> UnaryOperator BiFunction -> BinaryOperator also the experience of other language show that sometime user will want to create a function with 5 arguments, we will obviously not add TriFunction, QuadriFunction, QuintiFunction, etc. in the jdk, I think that using a latin prefix to indicate the arity is not the best convention, other languages tend to use Function, Function2, Function3 and so on. so I propose Function -> Operator Function2 -> Operator2 with the convention that if there is no number the arity is 1. also using a suffix is better because the primitive specialization use a prefix (avoid the question, is it IntBiFunction or BiIntFunction ?) regards, R?mi > > On 12/8/2012 10:45 AM, Remi Forax wrote: >> I've just found that we have the very same functional interface twice, >> Combiner and BiFunction. >> >> R?mi >> From brian.goetz at oracle.com Sat Dec 8 08:35:21 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 11:35:21 -0500 Subject: Combiner & BiFunction In-Reply-To: <50C36B28.7070107@univ-mlv.fr> References: <50C360A7.1060306@univ-mlv.fr> <50C3622D.2030903@oracle.com> <50C36B28.7070107@univ-mlv.fr> Message-ID: <50C36C49.407@oracle.com> > Can you also normalize the Function and Operator types name ? > > when it was Mapper instead of Function, it was not a big deal, but now > with the name Function, Operator and Function names are not aligned. > > Function -> UnaryOperator > BiFunction -> BinaryOperator We discussed this one already and it seemed the conclusion was that people were comfortable with the assymmetry in order to get the benefit of not mucking up the most common type names? The idea is that each base type (Function, Predicate, Block) had a "natural" arity and we'd only use the prefixes for other arities. From brian.goetz at oracle.com Sat Dec 8 08:39:06 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 11:39:06 -0500 Subject: Stream construction APIs Message-ID: <50C36D2A.6090708@oracle.com> We've done some overhauling on the stream-building APIs and I really like where they landed. (Close followers of the repo will note the churn, but it has stopped.) For clients, the primary way to get a stream is to ask some aggregate for one, as in: list.stream() String.chars() String.codePoints() Reader.lines() etc. For libraries that want to *build* streams (say, because they want to implement Streamable), there is a lower-level API. This lives primarily in Streams. The primary way to construct a stream (which the above implementations use) are from a spliterator and a set of stream flags. There are four entry points, the cross product of serial/parallel and immediate/deferred spliterator: static Stream stream(Supplier> supplier, int flags) static Stream stream(Spliterator spliterator, int flags) { return stream(() -> spliterator, flags); } static Stream parallel(Supplier> supplier, int flags) static Stream parallel(Spliterator spliterator, int flags) { return parallel(() -> spliterator, flags); } The distinction between serial and parallel simply determines whether stream traversals will be single-thread or parallel. The immediate vs deferred choice (Spliterator vs Supplier) requires a little more explanation. The pervasive non-interference assumption says that stream sources should not be modified while stream operations are in progress. (More precisely, stream implementations are not required to respect such modifications; streams from concurrent data structures are free to provide concurrent-modification-friendly implementations.) Given this, the question remains, when do the pipeline operations start, in time? There are two logical candidates: - at the point at which the stream source is captured - at the point at which a terminal operation is initiated For cases like c.stream().forEach(), it doesn't matter; these points collapse together. But for cases like: Collection c = ... Stream asFilteredStream = c.stream().filter(...); // populate c asFilteredStream.forEach(...) it makes a difference. By having the constructors take Supplier it lets us defer binding to the data until the result is needed. While we've mostly come down on the side of more restrictive stream-building (i.e., no forking), this one seems different because we cannot detect it early, and it seems a common mistake waiting to happen. It also turns out to be relatively cheap to support this case. For example, here's the stream() implementation from ArrayList: return Streams.stream( () -> Arrays.spliterator((E[]) elementData, 0, size), StreamOpFlag.IS_ORDERED | StreamOpFlag.IS_SIZED); This makes sure we are operating on the real data, as opposed to stale data, at the time we start the stream operations. The other piece of the stream construction is setting stream flags. The defined ones are: DISTINCT -- elements are distinct according to .equals SORTED -- elements are sorted according to natural order ORDERED -- encounter order of elements is considered relevant SIZED -- stream size is known and finite Some defaults are: Collection: SIZED List: ORDERED, SIZED Set: DISTICT, SIZED SortedSet: DISTINCT, ORDERED, SORTED, SIZED That's basically it for "how do I make a Stream." There are also some combinators for "make a Spliterator out of an Iterator", "make a spliterator for an array segment", etc. The other stuff in Streams are things like stream generators, which maybe should be moved to a Generators class? From brian.goetz at oracle.com Sat Dec 8 08:58:44 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 11:58:44 -0500 Subject: Constructing parallel streams Message-ID: <50C371C4.6000905@oracle.com> Following up on the previous note, now that the stream-building APIs have settled into something fairly simple and regular, I'm not completely happy with the arrangement of the stream() / parallel() buns. For collections, stream() and parallel() seem fine; the user already has a collection in hand, and can ask for a sequential or parallel stream. (Separately: I'm starting to prefer stream() / parallelStream() as the bun names here.) But, there are other ways to get a stream: String.chars() Reader.lines() regex.matches(source) etc It seems pretty natural for these things to return Streams. But, in accordance with our "no implicit parallelism" dictum, these streams are serial. But many of these streams can be operated on in parallel -- so the question is, how would we get a parallel stream out of these? One obvious choice is to have two operations for each of these: String.chars() String.charsAsParallelStream() That's pretty ugly, and unlikely to be consistently implemented. Now that the Streams construction API and internals have shaken out, another option has emerged. A Spliterator can be traversed sequentially or in parallel. Many sequential streams are constructed out of spliterators that already know how to split (e.g., Arrays.spliterator), and, we know how to expose some parallelism from otherwise sequential data sources anyway (see implementation of Iterators.spliterator). Just because iteration is sequential does not mean there is no exploitable parallelism. So, here's what I propose. Currently, we have a .sequential() operation, which is a no-op on sequential streams and on parallel streams acts as a barrier so that upstream computation can occur in parallel but downstream computation can occur serially, in encounter order (if defined), within-thread. We've also got a spliterator() "escape hatch". We can add to these a .parallel() operations, which on parallel streams is a no-op. The implementation is very simple and efficient (if applied early on in the pipeline.) Here's the default implementation (which is probably good enough for all cases): Stream parallel() { if (isParallel()) return this; else return Streams.parallel(spliterator(), getStreamFlags()); } What makes this efficient is that if you apply this operation at the very top of the pipeline, it just grabs the underlying spliterator, wraps it in a new stream with the parallel flag set, and keeps going. (If applied farther down the pipeline, spliterator() returns a spliterator wrapped with the intervening operations.) Bringing this back to our API, this enables us to have a .parallel() operation on Stream, so users can say: string.chars().parallel()... if they want to operate on the characters in parallel. The default implementation of parallel / parallelStream in Streamable could then be: default Stream parallel() { return stream().parallel(); } But I think it is still worth keeping the parallel / parallelStream bun for collections since this is such an important use case (and is still slightly more efficient; a few fewer object creations.) From tim at peierls.net Sat Dec 8 09:11:23 2012 From: tim at peierls.net (Tim Peierls) Date: Sat, 8 Dec 2012 12:11:23 -0500 Subject: Combiner & BiFunction In-Reply-To: <50C36C49.407@oracle.com> References: <50C360A7.1060306@univ-mlv.fr> <50C3622D.2030903@oracle.com> <50C36B28.7070107@univ-mlv.fr> <50C36C49.407@oracle.com> Message-ID: On Sat, Dec 8, 2012 at 11:35 AM, Brian Goetz wrote: > The idea is that each base type (Function, Predicate, Block) had a >> "natural" arity and we'd only use the prefixes for other arities. > > Yes. A foolish consistency... --tim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121208/5b26d06f/attachment.html From brian.goetz at oracle.com Sat Dec 8 11:27:33 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 14:27:33 -0500 Subject: "Cancelable" streams Message-ID: <50C394A5.5010003@oracle.com> And another subject that we need to close on -- "cancelable" streams. The primary use case for this is parallel processing on infinite streams, such as streams of event data. Here, you might want to process until some threshold has been reached (find the first N results), until some external event has occured (process for five minutes and take the best result; process until the asked to shut down.) As with .parallel(), the recent stabilization of the Stream.spliterator() escape hatch provides us a low-overhead way to support this without introducing new abstractions like CancelableStream or StreamFuture. Not surprisingly, the answer is a stream op: stream.cancelOn(BooleanSupplier shouldCancel, Runnable onCancel) .filter(...) .forEach(...) The way this works is that it polls the supplied BooleanSupplier to ask "should we cancel now." Once canceled, it acts as a gate shutting; no more elements are sent downstream, so downstream processing completes as if the stream were truncated. When cancelation occurs, it calls the onCancel Runnable so that the client can have a way to know that the pipeline completed due to cancelation rather than normal completion. A typical use might be: stream.cancelOn(() -> (System.currentTimeMillis() < endTime), () -> cancelFlag.set(true)) .filter(...) .forEach(...) The implementation is simple: Stream cancelOn(...) { return Streams.stream(cancelingSpliterator(spliterator()), getStreamFlags()); } The cancelation model is not "stop abruptly when the cancelation signal comes", but a more cooperative "use the cancelation signal to indicate that we should not start any more work." So if you're looking to stop after finding 10 candidate matches, it might actually find 11 or 12 before it stops -- but that's something the client code can deal with. For sequential streams, the semantics and implementation of the canceling spliterator are simple -- once the cancel signal comes, no more elements are dispensed from the iterator. For parallel streams WITHOUT a defined encounter order, it is similarly simple -- once the signal comes, no more elements are dispensed to any subtask, and no more splits are produced. For parallel streams WITH a defined encounter order, some more work is needed to define the semantics. A reasonable semantics would be: identify the latest chunk of input in the encounter order that has started processing, let any earlier chunks complete normally, and don't start any later chunks. This seems simple to spec and implement, unintrusive, and reasonably intuitive. From joe.bowbeer at gmail.com Sat Dec 8 11:44:56 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 8 Dec 2012 11:44:56 -0800 Subject: "Cancelable" streams In-Reply-To: <50C394A5.5010003@oracle.com> References: <50C394A5.5010003@oracle.com> Message-ID: Questions: How often is the BooleanSupplier called? Can one implement a BooleanSupplier that depended on the number of elements generated? (Or would you just use a lazy range method instead?) Doesn't cancellation occur as soon as the supplier returns false? If so, what's the advantage of an onCancel method? Or is it possible for the stream to be canceled by some other means? > The way this works is that it polls the supplied BooleanSupplier to ask > "should we cancel now." Once canceled, it acts as a gate shutting; no more > elements are sent downstream, so downstream processing completes as if the > stream were truncated. When cancelation occurs, it calls the onCancel > Runnable so that the client can have a way to know that the pipeline > completed due to cancelation rather than normal completion. On Sat, Dec 8, 2012 at 11:27 AM, Brian Goetz wrote: > The way this works is that it polls the supplied BooleanSupplier to ask > "should we cancel now." Once canceled, it acts as a gate shutting; no more > elements are sent downstream, so downstream processing completes as if the > stream were truncated. When cancelation occurs, it calls the onCancel > Runnable so that the client can have a way to know that the pipeline > completed due to cancelation rather than normal completion. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121208/5d779d18/attachment.html From brian.goetz at oracle.com Sat Dec 8 11:50:30 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 14:50:30 -0500 Subject: "Cancelable" streams In-Reply-To: References: <50C394A5.5010003@oracle.com> Message-ID: <50C39A06.1070801@oracle.com> > How often is the BooleanSupplier called? This is a tradeoff of responsiveness for overhead. It could be as often as every element; it could be as infrequently as starting every new split. > Can one implement a BooleanSupplier that depended on the number of > elements generated? (Or would you just use a lazy range method instead?) With some user responsibility for thread-safety, yes. For example: Collection answers = new ThreadSafeCollection(); stream.cancelOn(() -> answers.size() >= 10), () -> {}) .filter(...) .forEach(answers::add); It would be the user's responsibility to ensure that access to the shared data is properly protected. > Doesn't cancellation occur as soon as the supplier returns false? If > so, what's the advantage of an onCancel method? The onCancel lambda is so that there can be a feedback mechanism by which the client can answer "did my pipeline complete because it ran out of elements, or because we stopped processing due to cancelation?" From brian.goetz at oracle.com Sat Dec 8 12:38:40 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 15:38:40 -0500 Subject: groupBy / reduceBy Message-ID: <50C3A550.6010905@oracle.com> So, I hate groupBy/reduceBy. Not that I hate the idea, just their current realization. Reasons to hate them: - They intrude Map and Collection into the Stream API, whereas otherwise there would be no connection (except Iterator) to Old Collections. This falls short of a key goal, which is for Streams to be a bridge from Old Collections to New Collections in the future. We've already severed the 32-bit size limitation; we've distanced ourselves from the pervasive mutability of Old Collections; this is the remaining connection that needs to be severed. - They are limited. You can do one level of group-by, but you can't do two; it requires gymnastics to, for example, take a Stream and do a multi-level tabulation like grouping into a Map>. At the same time, they offer limited control over what kind of Map to use, what kind of Collection to use for the values for a given grouping, etc. - Guava-hostile. Guava users would probably like groupBy to return a Multimap. This should be easy, but currently is not. - The name reduceBy is completely unclear what it does. - Too-limited control over whether to use map-merging (required if you want to preserve encounter order, but probably slower) or accumulate results directly into a single shared ConcurrentMap (probably faster, but only if you don't care about encounter order). Currently we key off of having an encounter order here, but this should be a user choice, not a framework choice. These negatives play into the motivation for some upcoming proposals about reduce forms, which will propose a new, generalized formulation for these methods that address these negatives. Key observations: - groupBy is really just reduceBy where the reduce seed is "new ArrayList" and the combiner function is ArrayList::add - reduceBy is really just a reduce whose combiner function incorporates some mutable map mechanics From forax at univ-mlv.fr Sat Dec 8 13:34:16 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 08 Dec 2012 22:34:16 +0100 Subject: "Cancelable" streams In-Reply-To: <50C394A5.5010003@oracle.com> References: <50C394A5.5010003@oracle.com> Message-ID: <50C3B258.6020002@univ-mlv.fr> On 12/08/2012 08:27 PM, Brian Goetz wrote: > And another subject that we need to close on -- "cancelable" streams. > The primary use case for this is parallel processing on infinite > streams, such as streams of event data. Here, you might want to > process until some threshold has been reached (find the first N > results), until some external event has occured (process for five > minutes and take the best result; process until the asked to shut down.) > > As with .parallel(), the recent stabilization of the > Stream.spliterator() escape hatch provides us a low-overhead way to > support this without introducing new abstractions like > CancelableStream or StreamFuture. Not surprisingly, the answer is a > stream op: > > stream.cancelOn(BooleanSupplier shouldCancel, > Runnable onCancel) > .filter(...) > .forEach(...) > > The way this works is that it polls the supplied BooleanSupplier to > ask "should we cancel now." Once canceled, it acts as a gate > shutting; no more elements are sent downstream, so downstream > processing completes as if the stream were truncated. When > cancelation occurs, it calls the onCancel Runnable so that the client > can have a way to know that the pipeline completed due to cancelation > rather than normal completion. > > A typical use might be: > > stream.cancelOn(() -> (System.currentTimeMillis() < endTime), > () -> cancelFlag.set(true)) > .filter(...) > .forEach(...) > > The implementation is simple: > > Stream cancelOn(...) { > return Streams.stream(cancelingSpliterator(spliterator()), > getStreamFlags()); > } > > The cancelation model is not "stop abruptly when the cancelation > signal comes", but a more cooperative "use the cancelation signal to > indicate that we should not start any more work." So if you're > looking to stop after finding 10 candidate matches, it might actually > find 11 or 12 before it stops -- but that's something the client code > can deal with. > > > For sequential streams, the semantics and implementation of the > canceling spliterator are simple -- once the cancel signal comes, no > more elements are dispensed from the iterator. For parallel streams > WITHOUT a defined encounter order, it is similarly simple -- once the > signal comes, no more elements are dispensed to any subtask, and no > more splits are produced. For parallel streams WITH a defined > encounter order, some more work is needed to define the semantics. A > reasonable semantics would be: identify the latest chunk of input in > the encounter order that has started processing, let any earlier > chunks complete normally, and don't start any later chunks. > > > This seems simple to spec and implement, unintrusive, and reasonably > intuitive. > The main issue is that your example uses side effect, () -> cancelFlag.set(true) which goes against the model > Here, you might want to process until some threshold has been reached > (find the first N results), until some external event has occured > (process for five minutes and take the best result; process until the asked to shut down.) find the N results => use limit() some external events => don't use stream, use fork join directly. fitting the whole word and the kitchen sink into the Stream API is not a goal. regards, R?mi From brian.goetz at oracle.com Sat Dec 8 13:51:07 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 16:51:07 -0500 Subject: "Cancelable" streams In-Reply-To: <50C3B258.6020002@univ-mlv.fr> References: <50C394A5.5010003@oracle.com> <50C3B258.6020002@univ-mlv.fr> Message-ID: <50C3B64B.2030609@oracle.com> > The main issue is that your example uses side effect, > () -> cancelFlag.set(true) > which goes against the model The model is not "side effects are illegal"; we support forEach() and into() which are side-effectful. The model is more "don't use side-effects where they are not needed." > > Here, you might want to process until some threshold has been reached > > (find the first N results), until some external event has occured > > (process for five minutes and take the best result; process until the > asked to shut down.) > > find the N results => use limit() Parallel limit has some serious limitations that make it pretty unsuitable for this case. While these may be fixable, the effort and distortion involved is far, far greater than what is being suggested here. In fact, I'm on the fence about whether to keep limit at all in its current state; I worry people will expect more of it than it can deliver, and be unhappy. > fitting the whole word and the kitchen sink into the Stream API is not a > goal. No, but this is a pretty lightweight cancelation mechanism -- far more lightweight than limit as currently implemented. We've talked to customers who are very interested in using this for processing infinite event streams. The only thing missing is "how do I make it stop." From forax at univ-mlv.fr Sat Dec 8 14:21:33 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 08 Dec 2012 23:21:33 +0100 Subject: "Cancelable" streams In-Reply-To: <50C3B64B.2030609@oracle.com> References: <50C394A5.5010003@oracle.com> <50C3B258.6020002@univ-mlv.fr> <50C3B64B.2030609@oracle.com> Message-ID: <50C3BD6D.7090401@univ-mlv.fr> On 12/08/2012 10:51 PM, Brian Goetz wrote: >> The main issue is that your example uses side effect, >> () -> cancelFlag.set(true) >> which goes against the model > > The model is not "side effects are illegal"; we support forEach() and > into() which are side-effectful. The model is more "don't use > side-effects where they are not needed." forEach() and into() are terminal, not > >> > Here, you might want to process until some threshold has been reached >> > (find the first N results), until some external event has occured >> > (process for five minutes and take the best result; process until the >> asked to shut down.) >> >> find the N results => use limit() > > Parallel limit has some serious limitations that make it pretty > unsuitable for this case. While these may be fixable, the effort and > distortion involved is far, far greater than what is being suggested > here. In fact, I'm on the fence about whether to keep limit at all in > its current state; I worry people will expect more of it than it can > deliver, and be unhappy. I don't get it. You can check the limit the very same way you want to check shouldCancel. > >> fitting the whole word and the kitchen sink into the Stream API is not a >> goal. > > No, but this is a pretty lightweight cancelation mechanism -- far more > lightweight than limit as currently implemented. We've talked to > customers who are very interested in using this for processing > infinite event streams. The only thing missing is "how do I make it > stop." > And why this has to be included in the jdk ? It seems to be a great use case to see if there is enough entrypoints in the jdk to implement that semantics externally. R?mi From brian.goetz at oracle.com Sat Dec 8 14:34:47 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 17:34:47 -0500 Subject: "Cancelable" streams In-Reply-To: <50C3BD6D.7090401@univ-mlv.fr> References: <50C394A5.5010003@oracle.com> <50C3B258.6020002@univ-mlv.fr> <50C3B64B.2030609@oracle.com> <50C3BD6D.7090401@univ-mlv.fr> Message-ID: <50C3C087.5000204@oracle.com> > And why this has to be included in the jdk ? > It seems to be a great use case to see if there is enough entrypoints in > the jdk to implement that semantics externally. A nice idea, but we've already made the decision that we're not ready to publish the StreamOP APIs (the "SPI"), and therefore people will not be able to write their own ops. From brian.goetz at oracle.com Sat Dec 8 14:54:47 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 08 Dec 2012 17:54:47 -0500 Subject: "Cancelable" streams In-Reply-To: <50C3BD6D.7090401@univ-mlv.fr> References: <50C394A5.5010003@oracle.com> <50C3B258.6020002@univ-mlv.fr> <50C3B64B.2030609@oracle.com> <50C3BD6D.7090401@univ-mlv.fr> Message-ID: <50C3C537.3000700@oracle.com> >> Parallel limit has some serious limitations that make it pretty >> unsuitable for this case. While these may be fixable, the effort and >> distortion involved is far, far greater than what is being suggested >> here. In fact, I'm on the fence about whether to keep limit at all in >> its current state; I worry people will expect more of it than it can >> deliver, and be unhappy. > > I don't get it. You can check the limit the very same way you want to > check shouldCancel. Only in the serial case (easy) or in the parallel case where we know that we don't have to respect encounter order. But in the general case, encounter order is significant (consider a reduce with an associative but not commutative reducing function), you can't just send the first N that you happen to find through. limit(n) must send the first N in the *encounter order* in this case. This is where the pain and complexity comes from. From forax at univ-mlv.fr Sat Dec 8 15:03:27 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 09 Dec 2012 00:03:27 +0100 Subject: "Cancelable" streams In-Reply-To: <50C3C537.3000700@oracle.com> References: <50C394A5.5010003@oracle.com> <50C3B258.6020002@univ-mlv.fr> <50C3B64B.2030609@oracle.com> <50C3BD6D.7090401@univ-mlv.fr> <50C3C537.3000700@oracle.com> Message-ID: <50C3C73F.5020706@univ-mlv.fr> On 12/08/2012 11:54 PM, Brian Goetz wrote: >>> Parallel limit has some serious limitations that make it pretty >>> unsuitable for this case. While these may be fixable, the effort and >>> distortion involved is far, far greater than what is being suggested >>> here. In fact, I'm on the fence about whether to keep limit at all in >>> its current state; I worry people will expect more of it than it can >>> deliver, and be unhappy. >> >> I don't get it. You can check the limit the very same way you want to >> check shouldCancel. > > Only in the serial case (easy) or in the parallel case where we know > that we don't have to respect encounter order. But in the general > case, encounter order is significant (consider a reduce with an > associative but not commutative reducing function), you can't just > send the first N that you happen to find through. limit(n) must send > the first N in the *encounter order* in this case. This is where the > pain and complexity comes from. > that's why we have unordered() R?mi From forax at univ-mlv.fr Sat Dec 8 15:53:00 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 09 Dec 2012 00:53:00 +0100 Subject: "Cancelable" streams In-Reply-To: <50C3C087.5000204@oracle.com> References: <50C394A5.5010003@oracle.com> <50C3B258.6020002@univ-mlv.fr> <50C3B64B.2030609@oracle.com> <50C3BD6D.7090401@univ-mlv.fr> <50C3C087.5000204@oracle.com> Message-ID: <50C3D2DC.7040506@univ-mlv.fr> On 12/08/2012 11:34 PM, Brian Goetz wrote: >> And why this has to be included in the jdk ? >> It seems to be a great use case to see if there is enough entrypoints in >> the jdk to implement that semantics externally. > > A nice idea, but we've already made the decision that we're not ready > to publish the StreamOP APIs (the "SPI"), and therefore people will > not be able to write their own ops. You don't need to write ops to create a Stream, especially if what you want is only a parallel stream. R?mi From forax at univ-mlv.fr Sun Dec 9 06:54:40 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 09 Dec 2012 15:54:40 +0100 Subject: groupBy / reduceBy In-Reply-To: <50C3A550.6010905@oracle.com> References: <50C3A550.6010905@oracle.com> Message-ID: <50C4A630.8030907@univ-mlv.fr> On 12/08/2012 09:38 PM, Brian Goetz wrote: > So, I hate groupBy/reduceBy. Not that I hate the idea, just their > current realization. > > Reasons to hate them: > > - They intrude Map and Collection into the Stream API, whereas > otherwise there would be no connection (except Iterator) to Old > Collections. This falls short of a key goal, which is for Streams to > be a bridge from Old Collections to New Collections in the future. > We've already severed the 32-bit size limitation; we've distanced > ourselves from the pervasive mutability of Old Collections; this is > the remaining connection that needs to be severed. > > - They are limited. You can do one level of group-by, but you can't > do two; it requires gymnastics to, for example, take a > Stream and do a multi-level tabulation like grouping into > a Map>. At the same time, > they offer limited control over what kind of Map to use, what kind of > Collection to use for the values for a given grouping, etc. > > - Guava-hostile. Guava users would probably like groupBy to return a > Multimap. This should be easy, but currently is not. > > - The name reduceBy is completely unclear what it does. the other problem with reduceBy is that the combiner is only needed for the parallel case but not for the serial one. > > - Too-limited control over whether to use map-merging (required if > you want to preserve encounter order, but probably slower) or > accumulate results directly into a single shared ConcurrentMap > (probably faster, but only if you don't care about encounter order). > Currently we key off of having an encounter order here, but this > should be a user choice, not a framework choice. > > These negatives play into the motivation for some upcoming proposals > about reduce forms, which will propose a new, generalized formulation > for these methods that address these negatives. Key observations: > - groupBy is really just reduceBy where the reduce seed is "new > ArrayList" and the combiner function is ArrayList::add you mean, supplier is new ArrayList and reducer is ArrayList.add (combiner is ArrayList.addAll). > - reduceBy is really just a reduce whose combiner function > incorporates some mutable map mechanics > but the issue is that even if it's just a reduce, users will not see it has a reduce, i.e. we have to provide a groupBy. Let's try to do something. - we don't need a map but something with get() and put(). - we don't need a collection but something with new() and add(). so we can be fully generic if we the user send 4 lambdas. In fact they have to be grouped, put() without a get() is useless. with interface Mapping { V get(Object); V put(K key, V value); } and Destination (that already exist, we need it in into but currently it has only a method addAll) interface Destination { public boolean add(T element); public void addAll(Stream stream); } groupBy can be written , M extends Mapping> M groupBy(Function classifier, M mapping, Supplier destinationSupplier) and called like this: Map> map = personStream.groupBy(Person::getName, new HashMap<>(), ArrayList::new) or a little better if method reference accept the diamond syntax (I don't remember what we have decided here) Map> map = personStream.groupBy(Person::getName, new HashMap<>(), ArrayList<>::new) This remove all dependencies to the old collection. Also the interface Mapping can be changed to something like interface Mapping { V lookup(K key); void register(K key, V value); } if we add lookup and register as default methods in Map. cheers, R?mi From brian.goetz at oracle.com Sun Dec 9 07:55:14 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 09 Dec 2012 10:55:14 -0500 Subject: groupBy / reduceBy In-Reply-To: <50C4A630.8030907@univ-mlv.fr> References: <50C3A550.6010905@oracle.com> <50C4A630.8030907@univ-mlv.fr> Message-ID: <50C4B462.5090601@oracle.com> > the other problem with reduceBy is that the combiner is only needed for > the parallel case but not for the serial one. Which is true for other nonhomogeneous reducer forms as well. Does this bother us? We've already made the decision to have one Stream type, rather than separate ones for serial and parallel. . I think this is the right call, but if people disagree, let's hear it. From forax at univ-mlv.fr Sun Dec 9 09:29:11 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 09 Dec 2012 18:29:11 +0100 Subject: groupBy / reduceBy In-Reply-To: <50C4B462.5090601@oracle.com> References: <50C3A550.6010905@oracle.com> <50C4A630.8030907@univ-mlv.fr> <50C4B462.5090601@oracle.com> Message-ID: <50C4CA67.6020801@univ-mlv.fr> On 12/09/2012 04:55 PM, Brian Goetz wrote: >> the other problem with reduceBy is that the combiner is only needed for >> the parallel case but not for the serial one. > > Which is true for other nonhomogeneous reducer forms as well. Yes, these reducers are ugly too :) I think it's better to specify a mapping function (T -> U) and a combiner (U, U -> U). cheers, R?mi From forax at univ-mlv.fr Mon Dec 10 07:26:39 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 10 Dec 2012 16:26:39 +0100 Subject: Constructing parallel streams In-Reply-To: <50C371C4.6000905@oracle.com> References: <50C371C4.6000905@oracle.com> Message-ID: <50C5FF2F.1020408@univ-mlv.fr> I don't like users being able to call parallel in the middle of the stream construction. I propose to have an interface ParallelizableStream that allows to choose if the user want the sequential or the parallel stream upfront. So the interface can be defined as such interface ParallelizableStream extends Stream { public Stream parallel(); public Stream sequential(); // all other methods delegate to sequential() public default Optional findAny() { return sequential().findAny(); } ... } and Reader.lines() can return a ParallelizableStream. R?mi On 12/08/2012 05:58 PM, Brian Goetz wrote: > Following up on the previous note, now that the stream-building APIs > have settled into something fairly simple and regular, I'm not > completely happy with the arrangement of the stream() / parallel() buns. > > For collections, stream() and parallel() seem fine; the user already > has a collection in hand, and can ask for a sequential or parallel > stream. (Separately: I'm starting to prefer stream() / > parallelStream() as the bun names here.) > > But, there are other ways to get a stream: > > String.chars() > Reader.lines() > regex.matches(source) > etc > > It seems pretty natural for these things to return Streams. But, in > accordance with our "no implicit parallelism" dictum, these streams > are serial. But many of these streams can be operated on in parallel > -- so the question is, how would we get a parallel stream out of these? > > One obvious choice is to have two operations for each of these: > > String.chars() > String.charsAsParallelStream() > > That's pretty ugly, and unlikely to be consistently implemented. > > > Now that the Streams construction API and internals have shaken out, > another option has emerged. A Spliterator can be traversed > sequentially or in parallel. Many sequential streams are constructed > out of spliterators that already know how to split (e.g., > Arrays.spliterator), and, we know how to expose some parallelism from > otherwise sequential data sources anyway (see implementation of > Iterators.spliterator). Just because iteration is sequential does not > mean there is no exploitable parallelism. > > > So, here's what I propose. Currently, we have a .sequential() > operation, which is a no-op on sequential streams and on parallel > streams acts as a barrier so that upstream computation can occur in > parallel but downstream computation can occur serially, in encounter > order (if defined), within-thread. We've also got a spliterator() > "escape hatch". > > We can add to these a .parallel() operations, which on parallel > streams is a no-op. The implementation is very simple and efficient > (if applied early on in the pipeline.) > > Here's the default implementation (which is probably good enough for > all cases): > > Stream parallel() { > if (isParallel()) > return this; > else > return Streams.parallel(spliterator(), getStreamFlags()); > } > > What makes this efficient is that if you apply this operation at the > very top of the pipeline, it just grabs the underlying spliterator, > wraps it in a new stream with the parallel flag set, and keeps going. > (If applied farther down the pipeline, spliterator() returns a > spliterator wrapped with the intervening operations.) > > > Bringing this back to our API, this enables us to have a .parallel() > operation on Stream, so users can say: > > string.chars().parallel()... > > if they want to operate on the characters in parallel. > > The default implementation of parallel / parallelStream in Streamable > could then be: > > default Stream parallel() { > return stream().parallel(); > } > > But I think it is still worth keeping the parallel / parallelStream > bun for collections since this is such an important use case (and is > still slightly more efficient; a few fewer object creations.) > From brian.goetz at oracle.com Mon Dec 10 07:50:07 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Dec 2012 10:50:07 -0500 Subject: Constructing parallel streams In-Reply-To: <50C5FF2F.1020408@univ-mlv.fr> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> Message-ID: <50C604AF.2090007@oracle.com> > I don't like users being able to call parallel in the middle of the > stream construction. I don't love it either. The semantics are perfectly tractible, and the implementation is perfectly straightforward, but the performance is unlikely to be a win in most cases. (I mentioned earlier we would doc that this really should only be done at the head of the pipeline.) > I propose to have an interface ParallelizableStream that allows to > choose if the user want the sequential or the parallel stream upfront. Yeah, we investigated this direction first. Combinatorial explosion: IntParallelizableStream, etc. However, this could trivially become a dynamic property of streams (fits easily into the existing stream flags mechanism). Then only the head streams would have the property, and if you tried to do parallel() farther down the stream, we could ignore it or even throw ISE. From joe.bowbeer at gmail.com Mon Dec 10 08:01:39 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 10 Dec 2012 08:01:39 -0800 Subject: Constructing parallel streams In-Reply-To: <50C604AF.2090007@oracle.com> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> Message-ID: I can easily imagine a pipeline that has alternating sequential/parallel/sequential segments. Is there any reason to discourage a programmer from using the parallel/sequential methods to express this? On Dec 10, 2012 7:50 AM, "Brian Goetz" wrote: > I don't like users being able to call parallel in the middle of the >> stream construction. >> > > I don't love it either. The semantics are perfectly tractible, and the > implementation is perfectly straightforward, but the performance is > unlikely to be a win in most cases. (I mentioned earlier we would doc that > this really should only be done at the head of the pipeline.) > > I propose to have an interface ParallelizableStream that allows to >> choose if the user want the sequential or the parallel stream upfront. >> > > Yeah, we investigated this direction first. Combinatorial explosion: > IntParallelizableStream, etc. > > However, this could trivially become a dynamic property of streams (fits > easily into the existing stream flags mechanism). Then only the head > streams would have the property, and if you tried to do parallel() farther > down the stream, we could ignore it or even throw ISE. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121210/7599f3de/attachment.html From brian.goetz at oracle.com Mon Dec 10 08:03:29 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Dec 2012 11:03:29 -0500 Subject: Combiner -> BiFunction? Message-ID: <50C607D1.2050509@oracle.com> Are we OK with just merging Combiner into BiFunction? Both are the same signature: (T,U) -> V. Ordinarily I don't mind having specialized names if the name carries extra meaning, but I'm not sure this one does. From brian.goetz at oracle.com Mon Dec 10 08:08:18 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Dec 2012 11:08:18 -0500 Subject: Constructing parallel streams In-Reply-To: References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> Message-ID: <50C608F2.5070905@oracle.com> The only reason is that it may not perform as well as the user expects. The reason for this is that one of the big performance tricks we use is "jamming". When you do foos.filter(...).map(...).reduce(...) we can do the filtering, mapping, and reducing in a single pass (serial or parallel.) If you do foos.sequential().filter(...).parallel().map(...).sequential().reduce(...) then you may be introducing "barriers" in the computation, where something has to stop and collect the results before proceeding. This is giving up a lot of the performance benefit of the streams model. (Stateful ops, like sorting or limit, generally have a similar effect.) However, since we don't know anything about what the user is doing in those lambdas, it is conceivable that it is still a win. We do elide sequential/parallel calls if the stream already has that orientation (e.g., parallel on an already parallel stream is a no-op.) Overall I'm mostly in the "don't try to save the user from themselves" camp here. We should document how the model works and let performance-sensitive users measure for themselves. So while it is most effective to put the parallel() at the head of the pipe, my distaste for having it in the middle is merely mild and overall I can live with it. On 12/10/2012 11:01 AM, Joe Bowbeer wrote: > I can easily imagine a pipeline that has alternating > sequential/parallel/sequential segments. Is there any reason to > discourage a programmer from using the parallel/sequential methods to > express this? > > On Dec 10, 2012 7:50 AM, "Brian Goetz" > wrote: > > I don't like users being able to call parallel in the middle of the > stream construction. > > > I don't love it either. The semantics are perfectly tractible, and > the implementation is perfectly straightforward, but the performance > is unlikely to be a win in most cases. (I mentioned earlier we > would doc that this really should only be done at the head of the > pipeline.) > > I propose to have an interface ParallelizableStream that allows to > choose if the user want the sequential or the parallel stream > upfront. > > > Yeah, we investigated this direction first. Combinatorial > explosion: IntParallelizableStream, etc. > > However, this could trivially become a dynamic property of streams > (fits easily into the existing stream flags mechanism). Then only > the head streams would have the property, and if you tried to do > parallel() farther down the stream, we could ignore it or even throw > ISE. > From paul.sandoz at oracle.com Mon Dec 10 08:17:58 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 10 Dec 2012 17:17:58 +0100 Subject: Constructing parallel streams In-Reply-To: <50C604AF.2090007@oracle.com> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> Message-ID: <33D1BD61-CD50-4C42-9F24-35F3717686DB@oracle.com> On Dec 10, 2012, at 4:50 PM, Brian Goetz wrote: >> I don't like users being able to call parallel in the middle of the >> stream construction. > > I don't love it either. The semantics are perfectly tractible, and the implementation is perfectly straightforward, but the performance is unlikely to be a win in most cases. (I mentioned earlier we would doc that this really should only be done at the head of the pipeline.) > >> I propose to have an interface ParallelizableStream that allows to >> choose if the user want the sequential or the parallel stream upfront. > > Yeah, we investigated this direction first. Combinatorial explosion: IntParallelizableStream, etc. > > However, this could trivially become a dynamic property of streams (fits easily into the existing stream flags mechanism). Then only the head streams would have the property, and if you tried to do parallel() farther down the stream, we could ignore it or even throw ISE. > There might be cases were a stream is handed off to something else that wants to go parallel for stuff further down. Paul. From forax at univ-mlv.fr Mon Dec 10 08:23:03 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 10 Dec 2012 17:23:03 +0100 Subject: Combiner -> BiFunction? In-Reply-To: <50C607D1.2050509@oracle.com> References: <50C607D1.2050509@oracle.com> Message-ID: <50C60C67.1000203@univ-mlv.fr> On 12/10/2012 05:03 PM, Brian Goetz wrote: > Are we OK with just merging Combiner into BiFunction? Both are the > same signature: (T,U) -> V. > > Ordinarily I don't mind having specialized names if the name carries > extra meaning, but I'm not sure this one does. > Yes, and I still think that UnrayOperator should be Operator and BinaryOperator should be BiOperator for consistency. R?mi From sam at sampullara.com Mon Dec 10 09:09:36 2012 From: sam at sampullara.com (Sam Pullara) Date: Mon, 10 Dec 2012 09:09:36 -0800 Subject: "Cancelable" streams In-Reply-To: <50C394A5.5010003@oracle.com> References: <50C394A5.5010003@oracle.com> Message-ID: <9FEC5794-6238-4144-875F-42803DE1319C@sampullara.com> I ran into cancellation quite a few times when I was writing a fairly large codebase using Guava. Here is the typical setup: 1) You are trying to process some subset of the source but since you are filtering you don't know how much it will be, e.g. paginating. 2) The source of the data can't filter at a fine enough resolution so you are doing client side filtering. 3) The source of the data is a shared resource and you want to use it as little as possible, e.g. database connections. 4) The amount of work done to each row is much more than the cost of grabbing the row, i.e. the source is faster than the sink. So, as you are processing this stream of data, you find you are done. At that point you want to stop pulling from the source and return the connection to the shared pool of sources. Here is the current gruesome implementation of scanning in Havrobase (on MySQL) that does essentially look ahead for the client ultimately closing the connection and possibly reopening it if the client asks for more. If I had a better interface than Iterator it could have been better: https://github.com/spullara/havrobase/blob/master/mysql/src/main/java/avrobase/mysql/MysqlAB.java#L631 I think at the end of the day my use case may require more than sink-side cancellation, but that would have helped. In terms of how it is implemented, I can see two other implementations: 1) Stream cancelOn(AtomicBoolean cancelled); 2) Throwing a specific runtime cancellation exception that the framework recognizes and can propagate. In both of these cases it is obvious to the client that the stream was cancelled. Sam On Dec 8, 2012, at 11:27 AM, Brian Goetz wrote: > And another subject that we need to close on -- "cancelable" streams. The primary use case for this is parallel processing on infinite streams, such as streams of event data. Here, you might want to process until some threshold has been reached (find the first N results), until some external event has occured (process for five minutes and take the best result; process until the asked to shut down.) > > As with .parallel(), the recent stabilization of the Stream.spliterator() escape hatch provides us a low-overhead way to support this without introducing new abstractions like CancelableStream or StreamFuture. Not surprisingly, the answer is a stream op: > > stream.cancelOn(BooleanSupplier shouldCancel, > Runnable onCancel) > .filter(...) > .forEach(...) > > The way this works is that it polls the supplied BooleanSupplier to ask "should we cancel now." Once canceled, it acts as a gate shutting; no more elements are sent downstream, so downstream processing completes as if the stream were truncated. When cancelation occurs, it calls the onCancel Runnable so that the client can have a way to know that the pipeline completed due to cancelation rather than normal completion. > > A typical use might be: > > stream.cancelOn(() -> (System.currentTimeMillis() < endTime), > () -> cancelFlag.set(true)) > .filter(...) > .forEach(...) > > The implementation is simple: > > Stream cancelOn(...) { > return Streams.stream(cancelingSpliterator(spliterator()), > getStreamFlags()); > } > > The cancelation model is not "stop abruptly when the cancelation signal comes", but a more cooperative "use the cancelation signal to indicate that we should not start any more work." So if you're looking to stop after finding 10 candidate matches, it might actually find 11 or 12 before it stops -- but that's something the client code can deal with. > > > For sequential streams, the semantics and implementation of the canceling spliterator are simple -- once the cancel signal comes, no more elements are dispensed from the iterator. For parallel streams WITHOUT a defined encounter order, it is similarly simple -- once the signal comes, no more elements are dispensed to any subtask, and no more splits are produced. For parallel streams WITH a defined encounter order, some more work is needed to define the semantics. A reasonable semantics would be: identify the latest chunk of input in the encounter order that has started processing, let any earlier chunks complete normally, and don't start any later chunks. > > > This seems simple to spec and implement, unintrusive, and reasonably intuitive. > From sam at sampullara.com Mon Dec 10 11:00:54 2012 From: sam at sampullara.com (Sam Pullara) Date: Mon, 10 Dec 2012 11:00:54 -0800 Subject: "Cancelable" streams In-Reply-To: <9FEC5794-6238-4144-875F-42803DE1319C@sampullara.com> References: <50C394A5.5010003@oracle.com> <9FEC5794-6238-4144-875F-42803DE1319C@sampullara.com> Message-ID: <08D59B71-0E56-4A43-8A3C-7D0B397B534D@sampullara.com> After talking with Brian a fair bit about the issue, I think the simplest thing that would work for me is to add an override on limit. Stream limit(Predicate allow) Like other predicates this doesn't allow for side effects so you need to be prepared for them to be called out of order. However, once the framework receives a false it doesn't have to ever give you another element after that element based on encounter order. Also, it would act as a filter such that anything that returned false wouldn't continue on. Ideally the stream would attempt to stop as soon as possible. Sam On Dec 10, 2012, at 9:09 AM, Sam Pullara wrote: > I ran into cancellation quite a few times when I was writing a fairly large codebase using Guava. Here is the typical setup: > > 1) You are trying to process some subset of the source but since you are filtering you don't know how much it will be, e.g. paginating. > 2) The source of the data can't filter at a fine enough resolution so you are doing client side filtering. > 3) The source of the data is a shared resource and you want to use it as little as possible, e.g. database connections. > 4) The amount of work done to each row is much more than the cost of grabbing the row, i.e. the source is faster than the sink. > > So, as you are processing this stream of data, you find you are done. At that point you want to stop pulling from the source and return the connection to the shared pool of sources. > > Here is the current gruesome implementation of scanning in Havrobase (on MySQL) that does essentially look ahead for the client ultimately closing the connection and possibly reopening it if the client asks for more. If I had a better interface than Iterator it could have been better: > > https://github.com/spullara/havrobase/blob/master/mysql/src/main/java/avrobase/mysql/MysqlAB.java#L631 > > I think at the end of the day my use case may require more than sink-side cancellation, but that would have helped. In terms of how it is implemented, I can see two other implementations: > > 1) Stream cancelOn(AtomicBoolean cancelled); > 2) Throwing a specific runtime cancellation exception that the framework recognizes and can propagate. > > In both of these cases it is obvious to the client that the stream was cancelled. > > Sam > > On Dec 8, 2012, at 11:27 AM, Brian Goetz wrote: > >> And another subject that we need to close on -- "cancelable" streams. The primary use case for this is parallel processing on infinite streams, such as streams of event data. Here, you might want to process until some threshold has been reached (find the first N results), until some external event has occured (process for five minutes and take the best result; process until the asked to shut down.) >> >> As with .parallel(), the recent stabilization of the Stream.spliterator() escape hatch provides us a low-overhead way to support this without introducing new abstractions like CancelableStream or StreamFuture. Not surprisingly, the answer is a stream op: >> >> stream.cancelOn(BooleanSupplier shouldCancel, >> Runnable onCancel) >> .filter(...) >> .forEach(...) >> >> The way this works is that it polls the supplied BooleanSupplier to ask "should we cancel now." Once canceled, it acts as a gate shutting; no more elements are sent downstream, so downstream processing completes as if the stream were truncated. When cancelation occurs, it calls the onCancel Runnable so that the client can have a way to know that the pipeline completed due to cancelation rather than normal completion. >> >> A typical use might be: >> >> stream.cancelOn(() -> (System.currentTimeMillis() < endTime), >> () -> cancelFlag.set(true)) >> .filter(...) >> .forEach(...) >> >> The implementation is simple: >> >> Stream cancelOn(...) { >> return Streams.stream(cancelingSpliterator(spliterator()), >> getStreamFlags()); >> } >> >> The cancelation model is not "stop abruptly when the cancelation signal comes", but a more cooperative "use the cancelation signal to indicate that we should not start any more work." So if you're looking to stop after finding 10 candidate matches, it might actually find 11 or 12 before it stops -- but that's something the client code can deal with. >> >> >> For sequential streams, the semantics and implementation of the canceling spliterator are simple -- once the cancel signal comes, no more elements are dispensed from the iterator. For parallel streams WITHOUT a defined encounter order, it is similarly simple -- once the signal comes, no more elements are dispensed to any subtask, and no more splits are produced. For parallel streams WITH a defined encounter order, some more work is needed to define the semantics. A reasonable semantics would be: identify the latest chunk of input in the encounter order that has started processing, let any earlier chunks complete normally, and don't start any later chunks. >> >> >> This seems simple to spec and implement, unintrusive, and reasonably intuitive. >> > From brian.goetz at oracle.com Mon Dec 10 12:54:05 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Dec 2012 15:54:05 -0500 Subject: Forms for reduce() -- part 1 Message-ID: <50C64BED.2050100@oracle.com> I have been doing some brainstorming on forms for "fold". My primary goals for revisiting this include: - As mentioned in an earlier note, I want to get Map and Collection out of the Streams API (groupBy and reduceBy currently intrude these). This message lays the groundwork for this and I will follow up on these in a separate note. As I noted, there are many things currently wrong with the current groupBy/reduceBy that I want to fix. - Support "mutable fold" cases better, where the "seed" is really a mutable container (like a StringBuffer.) I'll start with use cases. There are some that fit purely into a traditional functional model, and others that fit better into a mutable model. While one can wedge one into the other, I think it may be better to be explicit about both. I am not suggesting naming right now, they could all be called reduce, though we may want to use different names to describe the functional vs mutable cases. Use cases -- purely functional ------------------------------ 1. Homogeneous operations on monoid (e.g., sum). Here, there is a monoid with a known zero. T reduce(T zero, BinaryOperator reducer) 2. Homogeneous operations on non-monoids (e.g., min). Here, there is no sensible zero, so we use Optional to reflect "nothing there". Ideally we would like to delay boxing to Optional until the very last operation (in other words, use (boolean, T) as the internal state and box to Optional at the very end.) Optional reduce(BinaryOperator reducer) 3. Nonhomogeneous operations (aka foldl, such as "sum of weights"). This requires an additional combiner function for this to work in parallel. U reduce(U zero, (U,T) -> U reducer, (U,U -> U) combiner) Optional reduce(T->U first, (U,T) -> U reducer, (U,U -> U) combiner) Note that most cases where we might be inclined to return Optional can be written as stream.map(T->U).reduce(BinaryOperator). Doug points out: if we went with "null means nothing", we wouldn't need the optional forms. This is basically what we have now, though we're currently calling the last form "fold". Doug has suggested we call them all reduce. Sub-question: people are constantly pointing out "but you don't need the combiner for the serial case." My orientation here is that the serial case is a special case, and while we want to ensure that those cases are well-served, we don't necessarily want to distort the API to include things that *only* work in the serial case. Use cases -- mutable -------------------- Many fold-like operations are better expressed with mutable state. We could easily simulate them with the foldl form, but it may well be better to call this form out specially. In these cases, there is also often a distinct internal and external representation. I'll give them the deliberately stupid name mReduce for now. The general form is: mReduce(Supplier makeEmpty, BiBlock addElement, BiBlock combineResults, Function getFinalResult) Here, I is the intermediate form, and E is the result. There are many cases where computations with an intermediate form is more efficient, so we want to maintain the intermediate form for as long as possible -- ideally until the last possible minute (when the whole reduction is done.) The analogue of reducer/combiner in the functional forms is "accept a new element" (addElement) and "combine one intermediate form with another" (combineResults). Examples: 3. Average. Here, we use an array of two ints to hold length and count. (Alternately we could use a custom tuple class.) Our intermediate form is int[2] and our final form is Double. Double average = integers.mReduce(() -> new int[2], (a, i) -> { a[0] += i; a[1]++ }, (a, b) -> { a[0] += b[0]; a[1] += b[1] }, a -> (double) a[0] / a[1]); Here, we maintain the int[2] form all the way throughout the computation, including as we combine up the tree, and only convert to double at the last minute. 4. String concatenation The signatures of the SAMs in mReduce were chosen to work with existing builder-y classes such as StringBuffer or ArrayList. We can do string concatenation using the functional form using String::concat, but it is inefficient -- lots of copying as we go up the tree. We can still use a mutable fold to do a concatenation with StringBuilder and mReduce. It has the nice property that all the arguments already have methods that have the right signature, so we can do it all with method refs. String s = strings.mReduce(StringBuilder::new, StringBuilder::append, StringBuilder::append, StringBuilder::toString); In this example, the two append method refs are targeting different versions of StringBuilder.append; the first is append(String) and the second is append(StringBuilder). But the compiler will figure this out. 5. toArray We can express "toArray" as a mutable fold using ArrayList to accumulate values and converting to an array at the end, just as with StringBuilder: Object[] array = foos.reduce(ArrayList::new, ArrayList::add, ArrayList::addAll, ArrayList::toArray); There are other mutable reduction use cases too. For example, sort can be implemented by providing a "insert in order" and a "merge sorted lists" method. While these are not necessarily the most efficient implementation, they may well make reasonable last-ditch defaults. Both of these examples use separate internal forms (StringBuffer, ArrayList) and external forms (String, array). Finally, for reasons that may become clearer in the next message, I think we should consider having an abstraction for "Reducer" or "Reduction" that captures all the bits needed for a reduction. This would allow the averager above to be reused: double average = integers.reduce(Reducers.INT_AVERAGER); This turns into a win when we try to recast groupBy/reduceBy into being general reductions (next message). So, summary: Functional forms: public U reduce(final U seed, final BinaryOperator op) { public Optional reduce(BinaryOperator op) { public R reduce(R base, Combiner reducer, BinaryOperator combiner) { Mutable form: public R reduce(Supplier baseFactory, BiBlock reducer, BiBlock combiner, Function finalResultMapper) { (and possibly a mutable form for special case where I=R) Possibly a form for a canned Reducer: public R reduce(Reducer reducer); From brian.goetz at oracle.com Mon Dec 10 13:46:36 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Dec 2012 16:46:36 -0500 Subject: Forms for reduce() -- part 2 Message-ID: <50C6583C.90602@oracle.com> This second note is about the proposed overhaul of groupBy/reduceBy. I've already outlined why I don't like these methods -- they both tie the API to Map/Collection, while at the same time giving users relatively limited ability to handle more than a few simple cases. We don't *need* groupBy/reduceBy, since groupBy is just reduceBy (with the row reducer being reduce(ArrayList::new, ArrayList::add)) and reduceBy is just reduce, with a suitably complicated implementation of the reducer functions. But we don't want users to have to build their own groupBy/reduceBy out of pure reduce -- it's too much work, too error-prone, etc. We should provide canned versions. The current versions are an example of what we might do, but I think we can do much better. There is also the issue that I don't think we've correctly surfaced the issues surrounding when encounter order must be maintained, and when we can relax these for better performance. Currently, we guess whether a stream has a defined encounter order or not based on its source; Lists do, non-sorted-Sets don't. But this is only indirectly coupled to whether the user *cares* about this order or not. The "unordered()" hack is helpful, but its a hack. The second factor here is associativity. If your reducing function is merely associative, which is the only reasonable assumption, then you MUST care about order. But many are also commutative (sum, min, max). The user knows this, but the framework currently does not, so it has to be conservative. The two ways to do a mutable reduce are: - Use mutable containers for each leaf, and merge them as you go up the tree; - Use a single, shared, concurrent mutable container (like a CHM.) IF you don't care about encounter order, OR your reduction is commutative, you can use the latter approach -- which is more like a forEach than a reduce. Currently some ops have two implementations, a merging one and a contenting one, keyed off of ORDERED. This is both complex for us to maintain and also not always what the user wants. Better to let the user just say. Since all problems are solved by another layer of indirection, I will propose a new abstraction -- "tabulation". Tabulation is the process of creating a structured description of an input stream. Tabulations could describe: - groupBy -- create a Map> given a classifying function T->U - reduceBy -- create a Map given a classifying function T->U and a reducer that reduces a stream of T to V - partition -- partition elements into two lists/arrays/etc based on some predicate - materialized function application / joins -- given a function T->U, produce a Map whose keys are the values the stream and whose values are the result of applying the function - more sophistication combinations of the above, such as a two-level groupBy or reduceBy (Map>) The latter cannot be handled at all with our current tools, nor can the current tools produce a Guava Multimap instead of a HashMap, or do Don's "groupByMulti". The only benefit to the current groupBy/reduceBy tools is that they automate some of the mechanics of producing fake multimaps. And even this advantage mostly goes away with the addition of Map.{merge,compute,computeIfAbsent}. Use cases -- tabulation ----------------------- Given the following domain model, here are some use cases for tabulation: class Document { Author author(); Editor editor(); int pages(); } 1. Group documents by author -- given a Stream, produce a MapLike>. Here, the user may want control over what kind of MapLike and what kind of CollectionLike to use. 2. Group documents by author and editor -- given a Stream, produce a Map>. 3. Partition documents into collections of "shorter than 50 pages" and "longer than 50 pages". 4. Count of documents by author -- produce a Map. 5. Longest doc by author -- produce a Map. 6. Sum of pages by author -- produce a Map. 7. Authors of documents joined with additional data -- produce a Map for some function Author->V. Our current stuff can do (1), (4), (5), and (6), but not (2), (3), or (7) easily. Let's assume we have an interface that describes Reducer (takes a set of T values and produces an R.) Then we can create canned reducers, such as the Average from the previous note: static final Reducer AVERAGE = reducer(...) Then we define: interface MutableTabulator extends Reducer { } And we define combinators for tabulators like: // traditional groupBy // will want versions to default to HashMap/ArrayList static , M extends Map> Tabulator groupBy(Function classifier, Supplier mapFactory, Supplier rowFactory) { ... } // nested groupBy static > Tabulator groupBy(Function classifier, Supplier mapFactory, Tabulator downstreamTabulator) { ... } // Take a Stream and a Function and create Map static > Tabulator mappedTo(Function mapper, Supplier mapFactory) { ... } // What we were calling reduceBy // Will want other reduce variants static> Tabulator groupedReduce(Function classifier, Supplier baseFactory, BiBlock acceptElement, BiBlock combiner) { } These are easy to define and users can define their own. We'll have a dozen or two tabulator factories and combinators, not so bad, and if we forget one, no worry. Guava can easily define a groupBy that groups to a MultiMap. Etc. So, our use cases using these: 1. Group documents by author Map> m = docs.tabulate(groupBy(Document::author, HashMap::new, ArrayList::new)); 2. Group documents by author and editor Map>> m = docs.tabulate(groupBy(Document::author, HashMap::new, groupBy(Document::editor)); 3. Partition documents into collections of "shorter than 50 pages" and "longer than 50 pages" List> l = docs.tabulate(partitionBy(d -> d.pages() <= 50), ArrayList::new, ArrayList::add); 4. Count of documents by author -- produce a Map. Map m = docs.tabulate(groupedReduce(Document::author, () -> 0, (s, d) -> s + 1 Integer::plus)); 5. Longest doc by author -- produce a Map. Map m = docs.tabulate(groupedReduce(Document::author, (d1, d2) -> (d1.pages() >= d2.pages()) ? d1 : d2, (d1, d2) -> (d1.pages() >= d2.pages()) ? d1 : d2) 6. Sum of pages by author -- produce a Map. Map m = docs.tabulate(groupedReduce(Document::author, () -> 0, (s, d) -> s + d.pages() Integer::plus)); 7. Authors of documents joined with additional data -- produce a Map for some function Author->V. Map m = documents.map(Document::author).uniqueElements() .tabulate(mapJoin(a -> f(a)); Overall, I think these forms are fairly usable. There are a pile of factories and combinators for reducers/tabulators, which can be combined to do whatever you want. Users can create new ones, or can compose them into canned tabulators. The last bit is the separation of fold-style functional merging from contention-based "toss it in a big ConcurrentHashMap." I think the answer here is to have two forms of tabulators. Let's call them Tabulator and ConcurrentTabulator for sake of discussion. For each canned tabulator form, there'd be a merging and a concurrent form. Now the choice is in the user's hands; if they don't care about encounter order or have a better-than-associative combining function, they can select the latter, which is more like a forEach than a reduce. It sounds like a lot of new surface area, but it really isn't. We need a few forms of reduce, an abstraction for Reducer, an abstraction for Tabulator, and some factories and combinators which are individually trivial to write. It move a lot of the "can't define your own stream ops" problems into a domain where users can define their own reducers and tabulators. From forax at univ-mlv.fr Mon Dec 10 16:05:30 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 11 Dec 2012 01:05:30 +0100 Subject: Constructing parallel streams In-Reply-To: References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> Message-ID: <50C678CA.6030807@univ-mlv.fr> On 12/10/2012 05:01 PM, Joe Bowbeer wrote: > > I can easily imagine a pipeline that has alternating > sequential/parallel/sequential segments. Is there any reason to > discourage a programmer from using the parallel/sequential methods to > express this? > You don't need a method sequential() or parallel() for that, you can use into. parallelStream.into(new ArrayList<>()).stream() is now sequential stream.into(new ArrayList<>()).parallel() is now parallel and into() offer a better control on the intermediary data structure. R?mi > On Dec 10, 2012 7:50 AM, "Brian Goetz" > wrote: > > I don't like users being able to call parallel in the middle > of the > stream construction. > > > I don't love it either. The semantics are perfectly tractible, > and the implementation is perfectly straightforward, but the > performance is unlikely to be a win in most cases. (I mentioned > earlier we would doc that this really should only be done at the > head of the pipeline.) > > I propose to have an interface ParallelizableStream that allows to > choose if the user want the sequential or the parallel stream > upfront. > > > Yeah, we investigated this direction first. Combinatorial > explosion: IntParallelizableStream, etc. > > However, this could trivially become a dynamic property of streams > (fits easily into the existing stream flags mechanism). Then only > the head streams would have the property, and if you tried to do > parallel() farther down the stream, we could ignore it or even > throw ISE. > From forax at univ-mlv.fr Mon Dec 10 16:09:41 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 11 Dec 2012 01:09:41 +0100 Subject: Constructing parallel streams In-Reply-To: <50C604AF.2090007@oracle.com> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> Message-ID: <50C679C5.9080204@univ-mlv.fr> On 12/10/2012 04:50 PM, Brian Goetz wrote: >> I don't like users being able to call parallel in the middle of the >> stream construction. > > I don't love it either. The semantics are perfectly tractible, and > the implementation is perfectly straightforward, but the performance > is unlikely to be a win in most cases. (I mentioned earlier we would > doc that this really should only be done at the head of the pipeline.) > >> I propose to have an interface ParallelizableStream that allows to >> choose if the user want the sequential or the parallel stream upfront. > > Yeah, we investigated this direction first. Combinatorial explosion: > IntParallelizableStream, etc. combinatorial exposion = 4, ParallelizableStream, IntParallelizableStream, LongParallelizableStream, DoubleParallelizableStream > > However, this could trivially become a dynamic property of streams > (fits easily into the existing stream flags mechanism). Then only the > head streams would have the property, and if you tried to do > parallel() farther down the stream, we could ignore it or even throw ISE. > see my answer to Joe, switching from sequential to parallel and vice versa doesn't really require adhoc methods. One another problem of sequential and parallel is that their semantics is ambiguous, the semantics of all other methods of Stream is once the method is called the previous stream is not valid anymore but if parallel or sequential can return this as you suggest, it's not true anymore. cheers, R?mi From brian.goetz at oracle.com Mon Dec 10 16:41:56 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Dec 2012 19:41:56 -0500 Subject: Constructing parallel streams In-Reply-To: <50C678CA.6030807@univ-mlv.fr> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> <50C678CA.6030807@univ-mlv.fr> Message-ID: <50C68154.6030803@oracle.com> > You don't need a method sequential() or parallel() for that, you can use > into. Sure, if you don't care about performance. Especially if the stream is infinite...into will take a loooong time. From forax at univ-mlv.fr Mon Dec 10 17:04:14 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 11 Dec 2012 02:04:14 +0100 Subject: Constructing parallel streams In-Reply-To: <50C68154.6030803@oracle.com> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> <50C678CA.6030807@univ-mlv.fr> <50C68154.6030803@oracle.com> Message-ID: <50C6868E.8080304@univ-mlv.fr> On 12/11/2012 01:41 AM, Brian Goetz wrote: >> You don't need a method sequential() or parallel() for that, you can use >> into. > > Sure, if you don't care about performance. Especially if the stream > is infinite...into will take a loooong time. :) And what is the semantics of parallel on an infinite stream, you split the infinity ? R?mi From brian.goetz at oracle.com Mon Dec 10 17:20:20 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Dec 2012 20:20:20 -0500 Subject: Constructing parallel streams In-Reply-To: <50C6868E.8080304@univ-mlv.fr> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> <50C678CA.6030807@univ-mlv.fr> <50C68154.6030803@oracle.com> <50C6868E.8080304@univ-mlv.fr> Message-ID: <50C68A54.3090500@oracle.com> > And what is the semantics of parallel on an infinite stream, you split > the infinity ? It is perfectly sensible. We can take infinite generators and parallelize them easily enough; see the implementation of Iterators.spliterator(). The only challenge is "how do you make it stop", if you care about that. Which is why we're talking about cancelation. From forax at univ-mlv.fr Mon Dec 10 23:47:26 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 11 Dec 2012 08:47:26 +0100 Subject: Constructing parallel streams In-Reply-To: <50C68A54.3090500@oracle.com> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> <50C678CA.6030807@univ-mlv.fr> <50C68154.6030803@oracle.com> <50C6868E.8080304@univ-mlv.fr> <50C68A54.3090500@oracle.com> Message-ID: <50C6E50E.8090002@univ-mlv.fr> On 12/11/2012 02:20 AM, Brian Goetz wrote: >> And what is the semantics of parallel on an infinite stream, you split >> the infinity ? > > It is perfectly sensible. We can take infinite generators and > parallelize them easily enough; see the implementation of > Iterators.spliterator(). The only challenge is "how do you make it > stop", if you care about that. Which is why we're talking about > cancelation. You are saying that you have a way to limit the stream, why is it different from parallelStream.limit(...).into(new ArrayList<>).stream() with limit using the semantics suggested by Sam. R?mi From paul.sandoz at oracle.com Tue Dec 11 08:34:55 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 11 Dec 2012 17:34:55 +0100 Subject: Constructing parallel streams In-Reply-To: <50C678CA.6030807@univ-mlv.fr> References: <50C371C4.6000905@oracle.com> <50C5FF2F.1020408@univ-mlv.fr> <50C604AF.2090007@oracle.com> <50C678CA.6030807@univ-mlv.fr> Message-ID: On Dec 11, 2012, at 1:05 AM, Remi Forax wrote: > On 12/10/2012 05:01 PM, Joe Bowbeer wrote: >> >> I can easily imagine a pipeline that has alternating sequential/parallel/sequential segments. Is there any reason to discourage a programmer from using the parallel/sequential methods to express this? >> > > You don't need a method sequential() or parallel() for that, you can use into. > > parallelStream.into(new ArrayList<>()).stream() is now sequential > stream.into(new ArrayList<>()).parallel() is now parallel > > and into() offer a better control on the intermediary data structure. > That just results in unnecessary copying, the internal representation can be more optimal. e.g.: parStream.sequential().forEach(...) compared to: parStream.into(new ArrayList<>()).forEach(...) Paul. From mike.duigou at oracle.com Thu Dec 13 21:24:30 2012 From: mike.duigou at oracle.com (Mike Duigou) Date: Thu, 13 Dec 2012 21:24:30 -0800 Subject: RFR : CR8004015 : Add parent interfaces and default methods to basic functional interfaces Message-ID: <99DC2763-ED44-42CA-90F3-04AD07B2846D@oracle.com> Hello all; I have updated the webrev again for hopefully the last time: http://cr.openjdk.java.net/~mduigou/8004015/3/webrev/ http://cr.openjdk.java.net/~mduigou/8004015/3/specdiff/overview-summary.html The implementation now uses Primitive.primitiveValue() ie. Integer.integerValue() rather than a cast. Same bytecode but using the intrinsic function makes it more clear that result is either primitive or NPE and that CCE is not possible. I have added @throws NPE for a number of the default methods. We won't be including @throws NPE in all cases where null is disallowed because when the @throws NPE is declared the API is required to throw NPE in that circumstance. So for cases where the NPE is "naturally" thrown or that aren't performance sensitive we will likely add @throws NPE declarations but for performance sensitive methods we won't be adding explicit null checks to match a @throws NPE specification. There's a tradeoff here in some cases. Please feel free to quibble about specific cases as they occur. :-) Mike From david.holmes at oracle.com Thu Dec 13 22:28:41 2012 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Dec 2012 16:28:41 +1000 Subject: RFR : CR8004015 : Add parent interfaces and default methods to basic functional interfaces In-Reply-To: <99DC2763-ED44-42CA-90F3-04AD07B2846D@oracle.com> References: <99DC2763-ED44-42CA-90F3-04AD07B2846D@oracle.com> Message-ID: <50CAC719.7090308@oracle.com> HI Mike, On 14/12/2012 3:24 PM, Mike Duigou wrote: > Hello all; > > I have updated the webrev again for hopefully the last time: > > http://cr.openjdk.java.net/~mduigou/8004015/3/webrev/ > http://cr.openjdk.java.net/~mduigou/8004015/3/specdiff/overview-summary.html > > The implementation now uses Primitive.primitiveValue() ie. Integer.integerValue() rather than a cast. Same bytecode but using the intrinsic function makes it more clear that result is either primitive or NPE and that CCE is not possible. > > I have added @throws NPE for a number of the default methods. We won't be including @throws NPE in all cases where null is disallowed because when the @throws NPE is declared the API is required to throw NPE in that circumstance. So for cases where the NPE is "naturally" thrown or that aren't performance sensitive we will likely add @throws NPE declarations but for performance sensitive methods we won't be adding explicit null checks to match a @throws NPE specification. There's a tradeoff here in some cases. Please feel free to quibble about specific cases as they occur. :-) That doesn't make sense to me. The throwing of the NPE is intended to be part of the specification not an implementation choice. Further @param foo non-null, is just as binding on implementations as @throws NPE if foo is null. ??? I think defining the NPE via the @param and @throws is a lose-lose situation: ! * @param left {@inheritDoc}, must be non-null ! * @param right {@inheritDoc}, must be non-null ! * @return {@inheritDoc}, always non-null ! * @throws NullPointerException if {@code left} or {@code right} is null You only need one convention. David ----- > Mike From david.holmes at oracle.com Fri Dec 14 05:06:58 2012 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Dec 2012 23:06:58 +1000 Subject: The implementation of default methods Message-ID: <50CB2472.2000306@oracle.com> Okay we now have several threads across the public lists debating the issue of whether the implementations of default methods that we are adding to the core libraries are the required implementation for those default methods or simply an implementation. Can we get a definitive position on this? Thanks, David From dl at cs.oswego.edu Fri Dec 14 05:20:58 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 08:20:58 -0500 Subject: The implementation of default methods In-Reply-To: <50CB2472.2000306@oracle.com> References: <50CB2472.2000306@oracle.com> Message-ID: <50CB27BA.90003@cs.oswego.edu> On 12/14/12 08:06, David Holmes wrote: > Okay we now have several threads across the public lists debating the issue of > whether the implementations of default methods that we are adding to the core > libraries are the required implementation for those default methods or simply an > implementation. > > Can we get a definitive position on this? > My vote is to use the form and style I showed for Map; still at http://gee.cs.oswego.edu/dl/wwwtmp/apis/Map.html Main idea: 1. A sentence or two of basic spec. 2. "The default implementation is equivalent to ..." (note that it must always be possible to say this for anything default implementable.) 3. Any constraints on overrides for implementors 4. Other notes, clarifications, examples, advice. 5. params/return/throw specs -Doug From forax at univ-mlv.fr Fri Dec 14 06:05:20 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 14 Dec 2012 15:05:20 +0100 Subject: Collection.addAll(stream) should not be named addAll Message-ID: <50CB3220.2000202@univ-mlv.fr> One fundamental things to check when you do overloading is that the two overloaded method should have the exact same semantics, clearly Collection.addAll(Collection) and Collection.addAll(Stream) have different semantics with respect to the order at least also addAll(Stream) doesn't return if the collection was mutated. I have not idea what name to use but not addAll. R?mi From forax at univ-mlv.fr Fri Dec 14 06:39:31 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 14 Dec 2012 15:39:31 +0100 Subject: Stream, spliterator, supplier and size Message-ID: <50CB3A23.90407@univ-mlv.fr> Brian explains why there is methods in Streams that takes a Supplier and the flags in a previous mail (I'm too lazy to find it now). Stream stream(Supplier> supplier, int flags) I've trouble to understand why we need to expose two semantics to our poor users, I think it's better to decide whenever (1) the spliterator is created when collection.stream() is called or (2) the spliterator is created when a terminal operation like stream.forEach is called. It has severe implications on the way the pipeline works under the hood because the pipeline ops may relies on the size of the collection which may be different if the collection is mutated between the creation of the stream and the call to the terminal operation. cheers, R?mi From forax at univ-mlv.fr Fri Dec 14 06:49:47 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 14 Dec 2012 15:49:47 +0100 Subject: Stream, spliterator, supplier and size In-Reply-To: <50CB3A23.90407@univ-mlv.fr> References: <50CB3A23.90407@univ-mlv.fr> Message-ID: <50CB3C8B.90303@univ-mlv.fr> On 12/14/2012 03:39 PM, Remi Forax wrote: > Brian explains why there is methods in Streams that takes a Supplier > and the flags in a previous mail > (I'm too lazy to find it now). > Stream stream(Supplier> supplier, int flags) > > I've trouble to understand why we need to expose two semantics to our > poor users, > I think it's better to decide whenever (1) the spliterator is created > when collection.stream() is called > or (2) the spliterator is created when a terminal operation like > stream.forEach is called. > > It has severe implications on the way the pipeline works under the > hood because the pipeline ops may relies on the size of the collection > which may be different if the collection is mutated between the > creation of the stream and the call to the terminal operation. > > cheers, > R?mi > After thinking a little bit more, I will vote for (1). We have decided that a Stream was more an Iterator than an Iterable, that's why we have decided to 'close' the stream after use i.e. to not reuse a Stream. R?mi From dl at cs.oswego.edu Fri Dec 14 06:52:55 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 09:52:55 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50C21662.8050308@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> Message-ID: <50CB3D47.5050106@cs.oswego.edu> Back to this after several diversions... On 12/07/12 11:16, Doug Lea wrote: >> Basic idea: defaults for function-accepting Map methods are solely >> in terms of the 4 CM methods, which are in turn non-atomic for non-CM.... > Unfortunately the null-value ambiguity hits yet again when moving from writing specs to writing default implementations. (Have I mentioned lately how terrible it is to allow nulls? :-) The defaults for function-accepting methods must rely not only on these 4 CHM methods, but also on get and/or containsKey. For null-accepting maps, you need the pair of them to default-implement (non-atomically) but for others, you must not use the pair of them (just get) to propagate the property that if putIfAbsent is thread-safe then so is computeIfAbsent. The only way out I see is to even further sacrifice sensibility for null-accepting maps, by saying that the methods are allowed to treat absence and mapping to null identically and that the default implementation does so. Here's computeIfAbsent. Any complaints? /** * If the specified key is not already associated with a value (or * is mapped to {@code null)), attempts to compute its value using * the given mapping function and enters it into the map unless * {@code null}. The default implementation is equivalent to the * following, then returning the current value or {@code null} if * absent: * *

 {@code
      * if (map.get(key) == null) {
      *   V newValue = mappingFunction.apply(key);
      *   if (newValue != null)
      *      map.putIfAbsent(key, newValue);
      * }}
* * If the function returns {@code null} no mapping is recorded. If * the function itself throws an (unchecked) exception, the * exception is rethrown to its caller, and no mapping is * recorded. The most common usage is to construct a new object * serving as an initial mapped value or memoized result, as in: * *
 {@code
      * map.computeIfAbsent(key, k -> new Value(f(k)));} 
* *

The default implementation makes no guarantees about * synchronization or atomicity properties of this method or the * application of the mapping function. Any class overriding this * method must specify its concurrency properties. In particular, * all implementations of subinterface {@link * java.util.concurrent.ConcurrentMap} must document whether the * function is applied once atomically only if the value is not * present. Any class that permits null values must document * whether and how this method distinguishes absence from null * mappings. * * @param key key with which the specified value is to be associated * @param mappingFunction the function to compute a value * @return the current (existing or computed) value associated with * the specified key, or null if the computed value is null * @throws NullPointerException if the specified key is null and * this map does not support null keys, or the * mappingFunction is null * @throws UnsupportedOperationException if the put operation * is not supported by this map * @throws ClassCastException if the class of the specified key or value * prevents it from being stored in this map * @throws RuntimeException or Error if the mappingFunction does so, * in which case the mapping is left unestablished */ default V computeIfAbsent(K key, Function mappingFunction) { V v, newValue; return ((v = get(key)) == null && (newValue = mappingFunction.apply(key)) != null && (v = putIfAbsent(key, newValue)) == null) ? newValue : v; } From david.lloyd at redhat.com Fri Dec 14 07:12:32 2012 From: david.lloyd at redhat.com (David M. Lloyd) Date: Fri, 14 Dec 2012 09:12:32 -0600 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB3D47.5050106@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> Message-ID: <50CB41E0.7080507@redhat.com> On 12/14/2012 08:52 AM, Doug Lea wrote: > > Back to this after several diversions... > > On 12/07/12 11:16, Doug Lea wrote: >>> Basic idea: defaults for function-accepting Map methods are solely >>> in terms of the 4 CM methods, which are in turn non-atomic for >>> non-CM.... >> > > Unfortunately the null-value ambiguity hits yet again when > moving from writing specs to writing default implementations. > (Have I mentioned lately how terrible it is to allow nulls? :-) > > The defaults for function-accepting methods must rely not > only on these 4 CHM methods, but also on get and/or containsKey. > For null-accepting maps, you need the pair of them > to default-implement (non-atomically) but for others, > you must not use the pair of them (just get) to propagate > the property that if putIfAbsent is thread-safe then so is > computeIfAbsent. > > The only way out I see is to even further sacrifice > sensibility for null-accepting maps, by saying that the > methods are allowed to treat absence and mapping to null > identically and that the default implementation does so. > Here's computeIfAbsent. Any complaints? > > /** > * If the specified key is not already associated with a value (or > * is mapped to {@code null)), attempts to compute its value using > * the given mapping function and enters it into the map unless > * {@code null}. The default implementation is equivalent to the > * following, then returning the current value or {@code null} if > * absent: > * > *

 {@code
>       * if (map.get(key) == null) {
>       *   V newValue = mappingFunction.apply(key);
>       *   if (newValue != null)
>       *      map.putIfAbsent(key, newValue);
>       * }}
> * > * If the function returns {@code null} no mapping is recorded. If > * the function itself throws an (unchecked) exception, the > * exception is rethrown to its caller, and no mapping is > * recorded. The most common usage is to construct a new object > * serving as an initial mapped value or memoized result, as in: > * > *
 {@code
>       * map.computeIfAbsent(key, k -> new Value(f(k)));} 
> * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method or the > * application of the mapping function. Any class overriding this > * method must specify its concurrency properties. In particular, > * all implementations of subinterface {@link > * java.util.concurrent.ConcurrentMap} must document whether the > * function is applied once atomically only if the value is not > * present. Any class that permits null values must document > * whether and how this method distinguishes absence from null > * mappings. > * > * @param key key with which the specified value is to be associated > * @param mappingFunction the function to compute a value > * @return the current (existing or computed) value associated with > * the specified key, or null if the computed value is null > * @throws NullPointerException if the specified key is null and > * this map does not support null keys, or the > * mappingFunction is null > * @throws UnsupportedOperationException if the put operation > * is not supported by this map > * @throws ClassCastException if the class of the specified key or > value > * prevents it from being stored in this map > * @throws RuntimeException or Error if the mappingFunction does so, > * in which case the mapping is left unestablished > */ > default V computeIfAbsent(K key, Function > mappingFunction) { > V v, newValue; > return ((v = get(key)) == null && > (newValue = mappingFunction.apply(key)) != null && > (v = putIfAbsent(key, newValue)) == null) ? newValue : v; > } > What's wrong with: default V computeIfAbsent(K key, Function mappingFunction) { V v, newValue; if ((v = get(key)) != null) return v; newValue = mappingFunction.apply(key); return putIfAbsent(key, newValue); } -- - DML From dl at cs.oswego.edu Fri Dec 14 07:17:09 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 10:17:09 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB41E0.7080507@redhat.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB41E0.7080507@redhat.com> Message-ID: <50CB42F5.5000602@cs.oswego.edu> On 12/14/12 10:12, David M. Lloyd wrote: > What's wrong with: > > default V computeIfAbsent(K key, Function > mappingFunction) { > V v, newValue; > if ((v = get(key)) != null) return v; > newValue = mappingFunction.apply(key); > return putIfAbsent(key, newValue); > } putIfAbsent uses the "put" convention of returning the previous value or null if absent. (Arguably this was a mistake, but not one we can do anything about.) -Doug From brian.goetz at oracle.com Fri Dec 14 07:24:12 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 10:24:12 -0500 Subject: Stream, spliterator, supplier and size In-Reply-To: <50CB3A23.90407@univ-mlv.fr> References: <50CB3A23.90407@univ-mlv.fr> Message-ID: <50CB449C.5070103@oracle.com> > I've trouble to understand why we need to expose two semantics to our > poor users, But I don't think we've exposed them to the users! The stream() methods are not for users, they are for library writers to implement stream-producing methods. If users are using them then that means we've forgotten to provide something else. (Writing Iterators is a pain the neck, much more so. But again, most users don't write iterators.) Your other argument about "we should bind to the data earlier" is a reasonable thing to discuss from a semantic perspective, but not because the API for making streams is too hard. From brian.goetz at oracle.com Fri Dec 14 07:37:57 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 10:37:57 -0500 Subject: Stream, spliterator, supplier and size In-Reply-To: <50CB3A23.90407@univ-mlv.fr> References: <50CB3A23.90407@univ-mlv.fr> Message-ID: <50CB47D5.5080903@oracle.com> By the way, it used to work as you suggest. Writing spliterators for collections like ArrayList was very painful, because you still had to check that the collection hadn't been modified since the spliterator was captured (otherwise you might be iterating the wrong array.) Which meant that you had to write a proxy spliterator which did checks the first time iterator() / split() / forEach() were called, before passing control onto the array spliterator. This was messy. Now, we just pass () -> Arrays.spliterator(array, 0, size) to stream() and we're done! So the alternative formulation is actually much worse for anyone who wants to make a Stream if they are wrapping it around a mutable collection. Also we have stream(Spliterator) sitting right next to it so if you are ready to bind immediately, you can. On 12/14/2012 9:39 AM, Remi Forax wrote: > Brian explains why there is methods in Streams that takes a Supplier and > the flags in a previous mail > (I'm too lazy to find it now). > Stream stream(Supplier> supplier, int flags) > > I've trouble to understand why we need to expose two semantics to our > poor users, > I think it's better to decide whenever (1) the spliterator is created > when collection.stream() is called > or (2) the spliterator is created when a terminal operation like > stream.forEach is called. > > It has severe implications on the way the pipeline works under the hood > because the pipeline ops may relies on the size of the collection which > may be different if the collection is mutated between the creation of > the stream and the call to the terminal operation. > > cheers, > R?mi > From david.lloyd at redhat.com Fri Dec 14 07:47:35 2012 From: david.lloyd at redhat.com (David M. Lloyd) Date: Fri, 14 Dec 2012 09:47:35 -0600 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB42F5.5000602@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB41E0.7080507@redhat.com> <50CB42F5.5000602@cs.oswego.edu> Message-ID: <50CB4A17.8060509@redhat.com> On 12/14/2012 09:17 AM, Doug Lea wrote: > On 12/14/12 10:12, David M. Lloyd wrote: > >> What's wrong with: >> >> default V computeIfAbsent(K key, Function >> mappingFunction) { >> V v, newValue; >> if ((v = get(key)) != null) return v; >> newValue = mappingFunction.apply(key); >> return putIfAbsent(key, newValue); >> } > > putIfAbsent uses the "put" convention of returning the previous value > or null if absent. (Arguably this was a mistake, but not one we can > do anything about.) Oh I see - the point I was missing is that computeIfAbsent is specified to return the *new* value if the mapping is replaced, instead of returning null as putIfAbsent does. Now given the precedent of putIfAbsent, why change the behavior in this case? Is it just because we are arguing that it was in fact a mistake? -- - DML From dl at cs.oswego.edu Fri Dec 14 08:03:00 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 11:03:00 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB3D47.5050106@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> Message-ID: <50CB4DB4.5090000@cs.oswego.edu> I placed a full candidate version of Map.java at http://gee.cs.oswego.edu/dl/wwwtmp/apis/Map.java At the moment I can't compile or javadoc this under current setup (working on it...) but if anyone wants to do so please let me know of problems. Otherwise, feel free to check in. I kept the four names, computeIfAbsent, computeIfPresent, compute, and merge. I agree that these are not wonderful but our experience with the many CHMV8 users is that people seem OK with them. Two flurries of traffic didn't arrive at anything a lot better. The main constraint is that many people are familiar with "computeIfAbsent", and the other names mostly fall out from there. I included (and added spec for) forEach(BiBlock) that is in the current lambda version. Not completely sure about spec though. Are there explicit seq/par versions? -Doug From dl at cs.oswego.edu Fri Dec 14 08:08:52 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 11:08:52 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB4A17.8060509@redhat.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB41E0.7080507@redhat.com> <50CB42F5.5000602@cs.oswego.edu> <50CB4A17.8060509@redhat.com> Message-ID: <50CB4F14.9040709@cs.oswego.edu> On 12/14/12 10:47, David M. Lloyd wrote: > Oh I see - the point I was missing is that computeIfAbsent is specified to > return the *new* value if the mapping is replaced, instead of returning null as > putIfAbsent does. > > Now given the precedent of putIfAbsent, why change the behavior in this case? > Is it just because we are arguing that it was in fact a mistake? > For computeIfAbsent, the method must return the computed value to you or else you won't know it. The tradeoff is that the method cannot tell you whether you've computed it vs it was already there. The putIfAbsent method does provide this old vs new information, so is in principle more powerful in this restricted sense. But it is also susceptible to mistakes when people use the return value directly rather than checking if null. -Doug From forax at univ-mlv.fr Fri Dec 14 08:06:55 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 14 Dec 2012 17:06:55 +0100 Subject: Stream, spliterator, supplier and size In-Reply-To: <50CB47D5.5080903@oracle.com> References: <50CB3A23.90407@univ-mlv.fr> <50CB47D5.5080903@oracle.com> Message-ID: <50CB4E9F.3020201@univ-mlv.fr> On 12/14/2012 04:37 PM, Brian Goetz wrote: > By the way, it used to work as you suggest. Writing spliterators for > collections like ArrayList was very painful, because you still had to > check that the collection hadn't been modified since the spliterator > was captured (otherwise you might be iterating the wrong array.) > Which meant that you had to write a proxy spliterator which did checks > the first time iterator() / split() / forEach() were called, before > passing control onto the array spliterator. This was messy. Now, we > just pass > > () -> Arrays.spliterator(array, 0, size) > > to stream() and we're done! > > So the alternative formulation is actually much worse for anyone who > wants to make a Stream if they are wrapping it around a mutable > collection. > > Also we have stream(Spliterator) sitting right next to it so if you > are ready to bind immediately, you can. It was painful because you try to implement the semantics 2 but takes the spliterator when creating the stream. You can also choose the semantics (1) and have no implementation issue. I'm not married with one of the semantics, but I think this should be clear. On 12/14/2012 04:24 PM, Brian Goetz wrote: > > But I don't think we've exposed them to the users! The stream() > methods are not for users, they are for library writers to implement > stream-producing methods. If users are using them then that means > we've forgotten to provide something else. (Writing Iterators is a > pain the neck, much more so. But again, most users don't write > iterators.) Library writers are users. Everything, we will expose will be used. > > Your other argument about "we should bind to the data earlier" is a > reasonable thing to discuss from a semantic perspective, but not > because the API for making streams is too hard. > It's not that the API is too hard to use, it's that it exposes two different semantics. R?mi > > On 12/14/2012 9:39 AM, Remi Forax wrote: >> Brian explains why there is methods in Streams that takes a Supplier and >> the flags in a previous mail >> (I'm too lazy to find it now). >> Stream stream(Supplier> supplier, int flags) >> >> I've trouble to understand why we need to expose two semantics to our >> poor users, >> I think it's better to decide whenever (1) the spliterator is created >> when collection.stream() is called >> or (2) the spliterator is created when a terminal operation like >> stream.forEach is called. >> >> It has severe implications on the way the pipeline works under the hood >> because the pipeline ops may relies on the size of the collection which >> may be different if the collection is mutated between the creation of >> the stream and the call to the terminal operation. >> >> cheers, >> R?mi >> From brian.goetz at oracle.com Fri Dec 14 08:10:36 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 11:10:36 -0500 Subject: Stream, spliterator, supplier and size In-Reply-To: <50CB4E9F.3020201@univ-mlv.fr> References: <50CB3A23.90407@univ-mlv.fr> <50CB47D5.5080903@oracle.com> <50CB4E9F.3020201@univ-mlv.fr> Message-ID: <50CB4F7C.6020804@oracle.com> >> But I don't think we've exposed them to the users! The stream() >> methods are not for users, they are for library writers to implement >> stream-producing methods. If users are using them then that means >> we've forgotten to provide something else. (Writing Iterators is a >> pain the neck, much more so. But again, most users don't write >> iterators.) > > Library writers are users. Everything, we will expose will be used. What we're exposing to library writers is easy! Here's the implementation in ArrayList: @Override public Stream stream() { return Streams.stream(() -> Arrays.spliterator((E[]) elementData, 0, size), StreamOpFlag.IS_ORDERED | StreamOpFlag.IS_SIZED); } Before, it was much harder for ArrayList. From brian.goetz at oracle.com Fri Dec 14 08:13:28 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 11:13:28 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB4F14.9040709@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB41E0.7080507@redhat.com> <50CB42F5.5000602@cs.oswego.edu> <50CB4A17.8060509@redhat.com> <50CB4F14.9040709@cs.oswego.edu> Message-ID: <50CB5028.2000509@oracle.com> As a user, I have found the computeIfAbsent semantics more useful. On 12/14/2012 11:08 AM, Doug Lea wrote: > On 12/14/12 10:47, David M. Lloyd wrote: > >> Oh I see - the point I was missing is that computeIfAbsent is >> specified to >> return the *new* value if the mapping is replaced, instead of >> returning null as >> putIfAbsent does. >> >> Now given the precedent of putIfAbsent, why change the behavior in >> this case? >> Is it just because we are arguing that it was in fact a mistake? >> > > For computeIfAbsent, the method must return the computed value > to you or else you won't know it. The tradeoff is that the method > cannot tell you whether you've computed it vs it was already there. > The putIfAbsent method does provide this old vs new information, > so is in principle more powerful in this restricted sense. > But it is also susceptible to mistakes when people use > the return value directly rather than checking if null. > > -Doug > > > From forax at univ-mlv.fr Fri Dec 14 08:15:16 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 14 Dec 2012 17:15:16 +0100 Subject: IntIterator Message-ID: <50CB5094.9050805@univ-mlv.fr> The current prototype declare a class IntIterator in java.util.stream.primitive, What is the status of this package ? I don't think you need to make it visible for users, it can be used internally without being exported and seen by everyone. You can also remove the method iterator() and spliterator() from IntIterator given those are escape hatch and that users can already do boxed().iterator() and boxed().spliterator(). R?mi From paul.sandoz at oracle.com Fri Dec 14 08:30:23 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 14 Dec 2012 17:30:23 +0100 Subject: IntIterator In-Reply-To: <50CB5094.9050805@univ-mlv.fr> References: <50CB5094.9050805@univ-mlv.fr> Message-ID: On Dec 14, 2012, at 5:15 PM, Remi Forax wrote: > The current prototype declare a class IntIterator in java.util.stream.primitive, > What is the status of this package ? > Ongoing... it changed today, I consolidated IntIterator into PrimitiveIterator. > I don't think you need to make it visible for users, it can be used internally without being exported and seen by everyone. > You can also remove the method iterator() and spliterator() from IntIterator given those are escape hatch and that users > can already do boxed().iterator() and boxed().spliterator(). > Library writers/integrators may need to create primitive streams from appropriate sources and it would be nice to avoid boxing on the input. In source repo tip PrimitiveStreams is the primitive version of Streams. Paul. From brian.goetz at oracle.com Fri Dec 14 08:30:29 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 11:30:29 -0500 Subject: The implementation of default methods In-Reply-To: <50CB2472.2000306@oracle.com> References: <50CB2472.2000306@oracle.com> Message-ID: <50CB5425.6000403@oracle.com> There are several layers of this issue. 1. How to document implementation properties of the default in a way that does *not* commit to doing things this way forever. This is analogous to "This implementation currently..." notes that are sometimes seen in class method documentation, and I don't think requires much more than that. For example, if the default implementation uses a bad algorithm, it could document this as "this implementation currently uses an O(n^2) algorithm; subclasses that care about performance should probably provide a better implementation." This does not commit us to maintaining a bad algorithm forever, but does provide useful detail to users and subclass maintainers. 2. How to document what a subclass can count on, so they can make a decision as to whether they need to override the method or not. Here, Doug's examples with putIfAbsent and friends are good; we document "this implementation behaves as if...", which is a kind of specification for the default, and the default had better continue to behave that way. 3. Retrofitted defaults in already-specified classes. This is the Iterator.remove issue being discussed on corelibs, where the issue is whether adding a default for remove that throws UOE now becomes part of the specification for Iterator.remove, and whether all compliant implementations much have a similarly-behavior Iterator.remove. This one is trickier than (2) because it may be changing the specification of existing code. So the question here is, what does it mean to add a default on Iterator.remove? Does that make undue work for JDK vendors who don't use our java.util classes? How do we write the spec to make it clear that having a default that throws UOE *is* part of the spec now? Here's my proposed answers: 1. Document "Implementation note: The default implementation currently..." 2. Document "The default implementation behaves as if..." (Or whatever Doug's proposed wording is.) 3. Document "The default implementation MUST" On 12/14/2012 8:06 AM, David Holmes wrote: > Okay we now have several threads across the public lists debating the > issue of whether the implementations of default methods that we are > adding to the core libraries are the required implementation for those > default methods or simply an implementation. > > Can we get a definitive position on this? > > Thanks, > David From brian.goetz at oracle.com Fri Dec 14 08:42:01 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 11:42:01 -0500 Subject: IntIterator In-Reply-To: References: <50CB5094.9050805@univ-mlv.fr> Message-ID: <50CB56D9.1000805@oracle.com> In general, we'd like to minimize the set of classes we make public, ideally the *Stream classes and the minimum necessary that they pull in. The "SPI" (the Op classes and supporting types) are definitely going to be private for 8, though we would like to get them to the point where we can open it up for anyone to build ops. This may mean collapsing the various subpackages of j.u.stream. Exposing IntIterator would be unfortunate because it invites users to "pull on the string" and ask "where's IntIterable", "where's IntList", "where's IntArrayList", etc. We'd like for what we release to be as "closed" as possible in the topological sense. On 12/14/2012 11:30 AM, Paul Sandoz wrote: > > On Dec 14, 2012, at 5:15 PM, Remi Forax wrote: > >> The current prototype declare a class IntIterator in java.util.stream.primitive, >> What is the status of this package ? >> > > Ongoing... it changed today, I consolidated IntIterator into PrimitiveIterator. > > >> I don't think you need to make it visible for users, it can be used internally without being exported and seen by everyone. >> You can also remove the method iterator() and spliterator() from IntIterator given those are escape hatch and that users >> can already do boxed().iterator() and boxed().spliterator(). >> > > Library writers/integrators may need to create primitive streams from appropriate sources and it would be nice to avoid boxing on the input. > > In source repo tip PrimitiveStreams is the primitive version of Streams. > > Paul. > From dl at cs.oswego.edu Fri Dec 14 08:45:21 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 11:45:21 -0500 Subject: The implementation of default methods In-Reply-To: <50CB5425.6000403@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> Message-ID: <50CB57A1.7060707@cs.oswego.edu> On 12/14/12 11:30, Brian Goetz wrote: > 1. Document "Implementation note: The default implementation currently..." As always, the fewer of these the better. In j.u/j.u.c, these are used mostly for resource limitations (like max threads in FJP) that might someday be lifted. > > 2. Document "The default implementation behaves as if..." (Or whatever Doug's > proposed wording is.) In j.u.c, we always say "is behaviorally equivalent to" but I dropped the "behaviorally" in Map candidate because someone once told me it was overly pedantic :-) > > 3. Document "The default implementation MUST" Isn't this just the normal spec part, that should precede the default implementation part? -Doug From brian.goetz at oracle.com Fri Dec 14 08:46:18 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 11:46:18 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB3D47.5050106@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> Message-ID: <50CB57DA.9040908@oracle.com> No complaint from me. The alternative seems worse; it says that because of the existence of null-supporting maps, no one can have these nice new features. But, I am not quite following your argument. Why does it fail if you replace "map.get(key) == null" with "!map.containsKey(key)"? For non-null-accepting maps they are equivalent. On 12/14/2012 9:52 AM, Doug Lea wrote: > > Back to this after several diversions... > > On 12/07/12 11:16, Doug Lea wrote: >>> Basic idea: defaults for function-accepting Map methods are solely >>> in terms of the 4 CM methods, which are in turn non-atomic for >>> non-CM.... >> > > Unfortunately the null-value ambiguity hits yet again when > moving from writing specs to writing default implementations. > (Have I mentioned lately how terrible it is to allow nulls? :-) > > The defaults for function-accepting methods must rely not > only on these 4 CHM methods, but also on get and/or containsKey. > For null-accepting maps, you need the pair of them > to default-implement (non-atomically) but for others, > you must not use the pair of them (just get) to propagate > the property that if putIfAbsent is thread-safe then so is > computeIfAbsent. > > The only way out I see is to even further sacrifice > sensibility for null-accepting maps, by saying that the > methods are allowed to treat absence and mapping to null > identically and that the default implementation does so. > Here's computeIfAbsent. Any complaints? > > /** > * If the specified key is not already associated with a value (or > * is mapped to {@code null)), attempts to compute its value using > * the given mapping function and enters it into the map unless > * {@code null}. The default implementation is equivalent to the > * following, then returning the current value or {@code null} if > * absent: > * > *

 {@code
>       * if (map.get(key) == null) {
>       *   V newValue = mappingFunction.apply(key);
>       *   if (newValue != null)
>       *      map.putIfAbsent(key, newValue);
>       * }}
> * > * If the function returns {@code null} no mapping is recorded. If > * the function itself throws an (unchecked) exception, the > * exception is rethrown to its caller, and no mapping is > * recorded. The most common usage is to construct a new object > * serving as an initial mapped value or memoized result, as in: > * > *
 {@code
>       * map.computeIfAbsent(key, k -> new Value(f(k)));} 
> * > *

The default implementation makes no guarantees about > * synchronization or atomicity properties of this method or the > * application of the mapping function. Any class overriding this > * method must specify its concurrency properties. In particular, > * all implementations of subinterface {@link > * java.util.concurrent.ConcurrentMap} must document whether the > * function is applied once atomically only if the value is not > * present. Any class that permits null values must document > * whether and how this method distinguishes absence from null > * mappings. > * > * @param key key with which the specified value is to be associated > * @param mappingFunction the function to compute a value > * @return the current (existing or computed) value associated with > * the specified key, or null if the computed value is null > * @throws NullPointerException if the specified key is null and > * this map does not support null keys, or the > * mappingFunction is null > * @throws UnsupportedOperationException if the put operation > * is not supported by this map > * @throws ClassCastException if the class of the specified key or > value > * prevents it from being stored in this map > * @throws RuntimeException or Error if the mappingFunction does so, > * in which case the mapping is left unestablished > */ > default V computeIfAbsent(K key, Function > mappingFunction) { > V v, newValue; > return ((v = get(key)) == null && > (newValue = mappingFunction.apply(key)) != null && > (v = putIfAbsent(key, newValue)) == null) ? newValue : v; > } > From dl at cs.oswego.edu Fri Dec 14 08:56:56 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 11:56:56 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB57DA.9040908@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB57DA.9040908@oracle.com> Message-ID: <50CB5A58.1020107@cs.oswego.edu> On 12/14/12 11:46, Brian Goetz wrote: > But, I am not quite following your argument. Why does it fail if you replace > "map.get(key) == null" with "!map.containsKey(key)"? For non-null-accepting > maps they are equivalent. > Consider the unlikely prospect of a null-accepting ConcurrentMap. And let's pick on computeIfPresent. If I try to trigger function on containsKey(key), then by the time I get(key) for function arg, it could have been removed (and rechecking doesn't help). And if I do the opposite, trigger on (v = get(key)) != null || containsKey, then v could be wrong. Either way loses. -Doug From brian.goetz at oracle.com Fri Dec 14 08:57:13 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 11:57:13 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB4DB4.5090000@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> Message-ID: <50CB5A69.8020706@oracle.com> We do not have explicit parallel versions of forEach for anything yet. Existing forEach methods are inherently sequential. I still find the name "compute" very unsatisfying, since it carries overtones that computing is something that happens once. I think either casting it as an alternate signature of merge (so there's merge(k, v, f) and merge(k, f)) would be better than the status quo. Alternately calling it "recompute" also seems to more accurately convey what is going on (and then rename computeIfPresent to recomputeIfPresent", since it makes it clear that computation will happen every time it is called. On 12/14/2012 11:03 AM, Doug Lea wrote: > > I placed a full candidate version of Map.java at > http://gee.cs.oswego.edu/dl/wwwtmp/apis/Map.java > > At the moment I can't compile or javadoc this under current > setup (working on it...) but if anyone wants to do so please > let me know of problems. Otherwise, feel free to check in. > > I kept the four names, computeIfAbsent, computeIfPresent, > compute, and merge. I agree that these are not wonderful > but our experience with the many CHMV8 users is that people > seem OK with them. Two flurries of traffic didn't arrive > at anything a lot better. The main constraint is that > many people are familiar with "computeIfAbsent", > and the other names mostly fall out from there. > > I included (and added spec for) forEach(BiBlock) that is > in the current lambda version. Not completely sure about > spec though. Are there explicit seq/par versions? > > > -Doug > > From brian.goetz at oracle.com Fri Dec 14 08:59:10 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 11:59:10 -0500 Subject: The implementation of default methods In-Reply-To: <50CB57A1.7060707@cs.oswego.edu> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> Message-ID: <50CB5ADE.2040709@oracle.com> Short answer: yes. Longer answer: the challenge is that these wordings differ from each other in subtle ways, which means either we must be very precise about exactly where these statements are, and their exact wording, or we need some more structure to the Javadoc so there's an obviously right place to put the words that make it clear what category you are in. On 12/14/2012 11:45 AM, Doug Lea wrote: > On 12/14/12 11:30, Brian Goetz wrote: > >> 1. Document "Implementation note: The default implementation >> currently..." > > As always, the fewer of these the better. In j.u/j.u.c, these > are used mostly for resource limitations (like max threads in FJP) > that might someday be lifted. > >> >> 2. Document "The default implementation behaves as if..." (Or >> whatever Doug's >> proposed wording is.) > > In j.u.c, we always say "is behaviorally equivalent to" but I dropped > the "behaviorally" in Map candidate because someone once told me > it was overly pedantic :-) > >> >> 3. Document "The default implementation MUST" > > Isn't this just the normal spec part, that should precede the default > implementation part? > > -Doug > From dl at cs.oswego.edu Fri Dec 14 09:56:05 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 12:56:05 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB5A69.8020706@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> Message-ID: <50CB6835.4000203@cs.oswego.edu> On 12/14/12 11:57, Brian Goetz wrote: > We do not have explicit parallel versions of forEach for anything yet. Existing > forEach methods are inherently sequential. But does any spec promise this? > I still find the name "compute" very unsatisfying, As I mentioned, I initially didn't like these names at all. Pasted here for convenience: V computeIfAbsent(K key, Function f); V computeIfPresent(K key, BiFunction f); V compute(K key, BiFunction f); V merge(K key, V value, BiFunction f); Starting from scratch, I might call computeIfAbsent "establish". (Three years ago, when considering a MapMaker-like j.u.c class, I proposed several names along these lines but eventually gave up.) But given computeIfAbsent, the name computeIfPresent seems forced, and then computeIfAbsentOrPresent==compute seems forced. And if you see the scheme laid out in this way, looks OK. Which is presumably why all the CHMV8 users seem to like it. > I still find the name "compute" very unsatisfying, since it carries overtones that computing is something that happens once. I think either casting it as an alternate signature of merge (so there's merge(k, v, f) and merge(k, f)) would be better than the status quo. The main problem with "merge" for this method is that it obscures the fact that it can act as either computeIfAbsent or computeIfPresent (assuming a function that distinguishes null arg). -Doug From brian.goetz at oracle.com Fri Dec 14 10:02:36 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 13:02:36 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB6835.4000203@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> Message-ID: <50CB69BC.3010202@oracle.com> >> We do not have explicit parallel versions of forEach for anything yet. >> Existing >> forEach methods are inherently sequential. > > But does any spec promise this? Our mantra has been "no transparent parallelism." So, I think it should promise this. If you want parallel forEach you can do: coll.parallel().forEach() >> I still find the name "compute" very unsatisfying, > > As I mentioned, I initially didn't like these names at all. > Pasted here for convenience: > > V computeIfAbsent(K key, Function f); > V computeIfPresent(K key, BiFunction f); > V compute(K key, BiFunction f); > V merge(K key, V value, BiFunction f); > > Starting from scratch, I might call computeIfAbsent "establish". > (Three years ago, when considering a MapMaker-like j.u.c class, > I proposed several names along these lines but eventually gave up.) > > But given computeIfAbsent, the name computeIfPresent seems forced, > and then computeIfAbsentOrPresent==compute seems forced. And if you > see the scheme laid out in this way, looks OK. Which is presumably > why all the CHMV8 users seem to like it. What about recompute for compute, and recomputeIfPresent for computeIfPresent? (The former has a slight weirdness about the first time, since you can't recompute something that isn't yet computed, but that weirdness seems much less than the unapproachability of compute. And as far as I can tell, no one has considered these names yet. From dl at cs.oswego.edu Fri Dec 14 10:20:41 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 13:20:41 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB69BC.3010202@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> Message-ID: <50CB6DF9.1090700@cs.oswego.edu> On 12/14/12 13:02, Brian Goetz wrote: >>> We do not have explicit parallel versions of forEach for anything yet. >>> Existing >>> forEach methods are inherently sequential. >> >> But does any spec promise this? > > If you want parallel forEach you can do: > > coll.parallel().forEach() But what's up with Maps? >> But given computeIfAbsent, the name computeIfPresent seems forced, >> and then computeIfAbsentOrPresent==compute seems forced. And if you >> see the scheme laid out in this way, looks OK. Which is presumably >> why all the CHMV8 users seem to like it. > > What about recompute for compute, and recomputeIfPresent for computeIfPresent? I forget what c-i list feedback led me to change the initial CHMV8 recompute() to computeIfPresent() but I can scan archives/replies if anyone cares. > (The former has a slight weirdness about the first time, since you can't > recompute something that isn't yet computed, Sounds like it a product of the Department of Redundancy Department :-) Sorry to resist all these suggestions, I'd list some of my own, but then I would be forced to reply to myself about why just living with "compute" is good enough :-) -Doug From joe.bowbeer at gmail.com Fri Dec 14 10:21:29 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Fri, 14 Dec 2012 10:21:29 -0800 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB69BC.3010202@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> Message-ID: I think that most if the initial users of the compute methods will already be familiar with the 'compute' names. I am, as is Doug. On Dec 14, 2012 10:02 AM, "Brian Goetz" wrote: > We do not have explicit parallel versions of forEach for anything yet. >>> Existing >>> forEach methods are inherently sequential. >>> >> >> But does any spec promise this? >> > > Our mantra has been "no transparent parallelism." So, I think it should > promise this. If you want parallel forEach you can do: > > coll.parallel().forEach() > > I still find the name "compute" very unsatisfying, >>> >> >> As I mentioned, I initially didn't like these names at all. >> Pasted here for convenience: >> >> V computeIfAbsent(K key, Function f); >> V computeIfPresent(K key, BiFunction >> f); >> V compute(K key, BiFunction f); >> V merge(K key, V value, BiFunction f); >> >> Starting from scratch, I might call computeIfAbsent "establish". >> (Three years ago, when considering a MapMaker-like j.u.c class, >> I proposed several names along these lines but eventually gave up.) >> >> But given computeIfAbsent, the name computeIfPresent seems forced, >> and then computeIfAbsentOrPresent==**compute seems forced. And if you >> see the scheme laid out in this way, looks OK. Which is presumably >> why all the CHMV8 users seem to like it. >> > > What about recompute for compute, and recomputeIfPresent for > computeIfPresent? (The former has a slight weirdness about the first time, > since you can't recompute something that isn't yet computed, but that > weirdness seems much less than the unapproachability of compute. And as far > as I can tell, no one has considered these names yet. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121214/b70eae5c/attachment.html From brian.goetz at oracle.com Fri Dec 14 10:27:10 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 13:27:10 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB6DF9.1090700@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> Message-ID: <50CB6F7E.3000803@oracle.com> >>> But given computeIfAbsent, the name computeIfPresent seems forced, >>> and then computeIfAbsentOrPresent==compute seems forced. And if you >>> see the scheme laid out in this way, looks OK. Which is presumably >>> why all the CHMV8 users seem to like it. >> >> What about recompute for compute, and recomputeIfPresent for >> computeIfPresent? > > I forget what c-i list feedback led me to change the initial > CHMV8 recompute() to computeIfPresent() but I can scan archives/replies > if anyone cares. Would be useful if you could dig up. Very often the problem is not an issue with a specific name, but their relationships, which mean they sometimes need to be considered as a group. From brian.goetz at oracle.com Fri Dec 14 10:32:00 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 13:32:00 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB6DF9.1090700@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> Message-ID: <50CB70A0.3000606@oracle.com> >>>> We do not have explicit parallel versions of forEach for anything yet. >>>> Existing >>>> forEach methods are inherently sequential. >>> >>> But does any spec promise this? >> >> If you want parallel forEach you can do: >> >> coll.parallel().forEach() > > But what's up with Maps? There's always map.entrySet().parallel().forEach() if this is the only case we're really worried about. But, let's uplevel this. - What *other* operations do you want to expose in parallel other than forEach? - If you wanted to make the default of forEach parallel, how would the user ask for serial? What about other operations? The issue of "parallel collections" is one that we deliberately sidestepped as a simplifying scope-reduction choice. Everything else we're doing here is either a simple serial extension to existing collection semantics (e.g., Collection.removeAll) or explicitly part of the Stream framework. So, we don't have an answer; my preference would be to not invent one a month before feature-freeze if we don't have to. From brian.goetz at oracle.com Fri Dec 14 10:33:09 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 13:33:09 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: <50C64BED.2050100@oracle.com> References: <50C64BED.2050100@oracle.com> Message-ID: <50CB70E5.7080500@oracle.com> I think I've got something on this. I think the number of new API abstractions can be collapsed down a bit but I've left it a little fluffy for clarity for the moment. Homogeneous forms: T reduce(T zero, BinaryOperator reducer); Optional reduce(BinaryOperator reducer); Nonhomogeneous forms (aka inject), in mutable and non-mutable flavors: U reduce(U zero, BiFunction accumulator, BinaryOperator reducer); R mutableReduce(Supplier seedFactory, BiBlock accumulator, BiBlock reducer); When these are needed, I suspect the mutable forms will generally be preferred to the functional ones(E.g., sumPages = docs.map(Document::getPageCount).reduce(0, Integer::sum). This is in part because many catamorphisms can be described with map+reduce, and Java's paucity of usable immutable types other than primitives and strings make the remainder easier with mutable accumulators, but this could improve in the future, which I think makes it worth keeping the functional version. Open question: do we keep the name mutableReduce to make it clear what is going on? For reasons that mostly have to do with the more complex cases (see below), I added a type to capture mutable reduction: public interface MutableReducer { R makeAccumulator(); void accumulate(R accumulator, T value); void combine(R accumulator, R other); } and some static factories for common mutable reductions like "create a new collection from the elements": static> MutableReducer intoCollection(Supplier collectionFactory) None of this new. The new stuff is what replaces groupBy and reduceBy. For the time being I'm putting these in the category of "tabulation", even though these are really just reductions. (We can discuss whether the Tabulator abstractions carry their weight.) There are two approaches to tabulation, I'm calling them currently Tabulator and ConcurrentTabulator. The regular approach is like any other parallel reduce; break the input into chunks, compute the sub-result for each chunk, and merge the sub-results. The concurrent approach (suitable for groupBy/reduceBy applications IF your reducers are commutative and/or you don't care about encounter order) uses a single ConcurrentMap to accumulate the results, relying on the implementation's natural contention management to allow concurrent updates. Note that concurrent tabulation is more like forEach than like reduce. One thing that's new here is we move the concurrent/traditional choice explicitly to the user, rather than trying to guess from the ORDERED flag. This is good for a number of reasons, including that using the ORDERED flag is just a guess about what the user wants. Currently we've got two new abstractions: public interface Tabulator extends MutableReducer { } public interface ConcurrentTabulator { R makeAccumulator(); Block makeSink(R accumulator); } and two new Stream methods: R tabulate(Tabulator tabulator); R tabulate(ConcurrentTabulator tabulator); which replace groupBy / reduceBy (and also provide the equivalent of mapped (Stream+Function -> Map), partition, allow multi-level groupBy / reduceBy, allow groupBy/reduceBy to use targets like Guava multimap, etc.) The key new thing is the introduction of tabulator factory methods (currently in Tabulators and ConcurrentTabulators.) There are a lot of variants but most of them are one-line wrappers around a small number of general implementations. The simplest is groupBy: Tabulator groupBy(Function classifier) This is a simple groupBy -- take a Stream and a function T->K and produce a Map>. A variant takes supplier lambdas (generally constructor refs) if you want to customize what kind of Map and Collection are used (e.g., TreeMap>): Tabulator groupBy(Function classifier, Supplier mapFactory, Supplier rowFactory) The user uses this like: Map> byAuthor = docs.tabulate(groupBy(Doc::getAuthor)); Under the hood groupBy is really just a reducer. We can build what we've been calling reduceBy (awful name) on top of this, by providing a reducer to handle the "downstream" elements. We know there will always be at least one, so there's no need to provide a zero or handle Optional. Currently these are called groupBy but I think groupedReduce might be a better name? There are three useful forms: // homogeneous reduce Tabulator groupBy(Function classifier, BinaryOperator reducer) // map+reduce Tabulator groupBy(Function classifier, Function mapper, BinaryOperator reducer) // mutable reduce Tabulator groupBy(Function classifier, MutableReducer downstreamReducer) Because a Tabulator is a MutableReducer, the latter form allows multi-level group-by. Some examples, using this domain model: interface Txn { Buyer buyer(); Seller seller(); int amount(); Date date(); } // Transactions by buyer Map> m = txns.tabulate(groupBy(Txn::buyer)); // Sum of amounts by buyer Map m = txns.tabulate(groupBy(Txn::buyer, Txn::amount, Integer::sum)); // Most recent transaction by buyer Map m = txns.tabulate(groupBy(Txn::buyer, moreRecent))); Where moreRecent is defined using a new (one line) combinator in Comparators to take a Comparator and turn it into a BinaryOperator that chooses the larger: BinaryOperator moreRecent = Comparators.greaterOf(Comparators.comparing(Txn::date))); // Transactions by buyer, seller Map>> m m = txns.tabulate(groupBy(Txn::buyer, groupBy(Txn::seller))); You can go arbitrarily deep with grouping and use any reduction you want at the bottom. Specializations can customize the Map type. Guava can provide tabulators that tabulate into multimaps. So, that's the new and improved groupBy/reduceBy. But there's more. Tabulator mappedTo(Function mapper, BinaryOperator mergeFunction) This takes a Stream and a function T->U and produces a materialized Map. The merge function deals with duplicates in the stream, using the new Map.merge. Map> m = people.tabulate(p -> getHistory(p)); Partition is also trivial. Like groupBy, we don't want to just do a one-level partition; we want to do an arbitrary reduction on each half. Versions: Tabulator[]> partition(Predicate) Tabulator partition(Predicate, MutableReducer) Tabulator partition(Predicate, BinaryOperator) Tabulator partition(Predicate, Mapper, BinaryOperator) Because tabulators are reducers, you can nest these too: Map>[] partitioned = txns.tabulate(partition(pred, groupBy(Txn::buyer))); Map[]> byBuyerPartitioned = txns.tabulate(groupBy(Txn::buyer), partition(pred))); There are (or will be) concurrent versions of all of these too, in ConcurrentTabulators. To be checked in shortly. I think there's no question that this approach beats the pants off of the existing groupBy/reduceBy. But there are still many open issues, including: - How many forms of these can we tolerate? The explosion is already slightly unfortunate (though most are one-liners.) - Naming -- all names are up for discussion - MutableReducer - Tabulator - ConcurrentTabulator - groupBy vs groupedReduce - reduce vs mutableReduce - reduce vs tabulate - Is the current concurrent-vs-non the right way to introduce this choice? - Does this approach suggest a better way to do into()? On 12/10/2012 3:54 PM, Brian Goetz wrote: > I have been doing some brainstorming on forms for "fold". My primary > goals for revisiting this include: > > - As mentioned in an earlier note, I want to get Map and Collection > out of the Streams API (groupBy and reduceBy currently intrude these). > This message lays the groundwork for this and I will follow up on these > in a separate note. As I noted, there are many things currently wrong > with the current groupBy/reduceBy that I want to fix. > > - Support "mutable fold" cases better, where the "seed" is really a > mutable container (like a StringBuffer.) > > I'll start with use cases. There are some that fit purely into a > traditional functional model, and others that fit better into a mutable > model. While one can wedge one into the other, I think it may be better > to be explicit about both. > > I am not suggesting naming right now, they could all be called reduce, > though we may want to use different names to describe the functional vs > mutable cases. > > > Use cases -- purely functional > ------------------------------ > > 1. Homogeneous operations on monoid (e.g., sum). Here, there is a > monoid with a known zero. > > T reduce(T zero, BinaryOperator reducer) > > 2. Homogeneous operations on non-monoids (e.g., min). Here, there is > no sensible zero, so we use Optional to reflect "nothing there". Ideally > we would like to delay boxing to Optional until the very last operation > (in other words, use (boolean, T) as the internal state and box to > Optional at the very end.) > > Optional reduce(BinaryOperator reducer) > > 3. Nonhomogeneous operations (aka foldl, such as "sum of weights"). > This requires an additional combiner function for this to work in parallel. > > U reduce(U zero, (U,T) -> U reducer, (U,U -> U) combiner) > Optional reduce(T->U first, (U,T) -> U reducer, > (U,U -> U) combiner) > > Note that most cases where we might be inclined to return Optional > can be written as stream.map(T->U).reduce(BinaryOperator). > > Doug points out: if we went with "null means nothing", we wouldn't need > the optional forms. > > This is basically what we have now, though we're currently calling the > last form "fold". Doug has suggested we call them all reduce. > > Sub-question: people are constantly pointing out "but you don't need the > combiner for the serial case." My orientation here is that the serial > case is a special case, and while we want to ensure that those cases are > well-served, we don't necessarily want to distort the API to include > things that *only* work in the serial case. > > > Use cases -- mutable > -------------------- > > Many fold-like operations are better expressed with mutable state. We > could easily simulate them with the foldl form, but it may well be > better to call this form out specially. In these cases, there is also > often a distinct internal and external representation. I'll give them > the deliberately stupid name mReduce for now. > > The general form is: > > mReduce(Supplier makeEmpty, > BiBlock addElement, > BiBlock combineResults, > Function getFinalResult) > > Here, I is the intermediate form, and E is the result. There are many > cases where computations with an intermediate form is more efficient, so > we want to maintain the intermediate form for as long as possible -- > ideally until the last possible minute (when the whole reduction is done.) > > The analogue of reducer/combiner in the functional forms is "accept a > new element" (addElement) and "combine one intermediate form with > another" (combineResults). > > Examples: > > 3. Average. Here, we use an array of two ints to hold length and > count. (Alternately we could use a custom tuple class.) Our > intermediate form is int[2] and our final form is Double. > > Double average = integers.mReduce(() -> new int[2], > (a, i) -> { a[0] += i; a[1]++ }, > (a, b) -> { a[0] += b[0]; a[1] += b[1] }, > a -> (double) a[0] / a[1]); > > Here, we maintain the int[2] form all the way throughout the > computation, including as we combine up the tree, and only convert to > double at the last minute. > > 4. String concatenation > > The signatures of the SAMs in mReduce were chosen to work with existing > builder-y classes such as StringBuffer or ArrayList. We can do string > concatenation using the functional form using String::concat, but it is > inefficient -- lots of copying as we go up the tree. We can still use a > mutable fold to do a concatenation with StringBuilder and mReduce. It > has the nice property that all the arguments already have methods that > have the right signature, so we can do it all with method refs. > > String s = strings.mReduce(StringBuilder::new, > StringBuilder::append, > StringBuilder::append, > StringBuilder::toString); > > In this example, the two append method refs are targeting different > versions of StringBuilder.append; the first is append(String) and the > second is append(StringBuilder). But the compiler will figure this out. > > 5. toArray > > We can express "toArray" as a mutable fold using ArrayList to accumulate > values and converting to an array at the end, just as with StringBuilder: > > Object[] array = foos.reduce(ArrayList::new, > ArrayList::add, > ArrayList::addAll, > ArrayList::toArray); > > There are other mutable reduction use cases too. For example, sort can > be implemented by providing a "insert in order" and a "merge sorted > lists" method. While these are not necessarily the most efficient > implementation, they may well make reasonable last-ditch defaults. > > Both of these examples use separate internal forms (StringBuffer, > ArrayList) and external forms (String, array). > > > Finally, for reasons that may become clearer in the next message, I > think we should consider having an abstraction for "Reducer" or > "Reduction" that captures all the bits needed for a reduction. This > would allow the averager above to be reused: > > double average = integers.reduce(Reducers.INT_AVERAGER); > > This turns into a win when we try to recast groupBy/reduceBy into being > general reductions (next message). > > > So, summary: > > Functional forms: > > public U reduce(final U seed, final BinaryOperator op) { > > public Optional reduce(BinaryOperator op) { > > public R reduce(R base, Combiner reducer, > BinaryOperator combiner) { > > Mutable form: > > public R reduce(Supplier baseFactory, > BiBlock reducer, > BiBlock combiner, > Function finalResultMapper) { > > (and possibly a mutable form for special case where I=R) > > Possibly a form for a canned Reducer: > > public R reduce(Reducer reducer); > > > > From dl at cs.oswego.edu Fri Dec 14 11:08:56 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 14:08:56 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB6F7E.3000803@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> <50CB6F7E.3000803@oracle.com> Message-ID: <50CB7948.7070101@cs.oswego.edu> On 12/14/12 13:27, Brian Goetz wrote: > Would be useful if you could dig up. Very often the problem is not an issue > with a specific name, but their relationships, which mean they sometimes need to > be considered as a group. > See September and October 2011 archives at: http://cs.oswego.edu/pipermail/concurrency-interest/ Plus a few scattered followups. On rescan I see that the sequence of events was first to support only computeIfAbsent. Then also a form of compute called recompute that initially didn't allow null arg or return so couln't be used for initialization or removal. Then changing this but adding computeIfPresent to better support update-only usage. Then merge was added months later to better handle groupBy etc. Note that the only method you minimally need is compute. But the others avoid need for transient captures/lambdas, non-transparent effects, and extra work inside implementations. So it is likely in practice to be the least well used. In which case (sour grapes mode) having an odd name is not so bad. -Doug From brian.goetz at oracle.com Fri Dec 14 11:23:25 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 14:23:25 -0500 Subject: Constructing parallel streams In-Reply-To: <50C371C4.6000905@oracle.com> References: <50C371C4.6000905@oracle.com> Message-ID: <50CB7CAD.1000405@oracle.com> I've added .parallel() to Stream. The implementation is trivial: public Stream parallel() { if (isParallel()) return this; else { return Streams.parallel(spliterator(), getStreamFlags()); } } When applied at the head of a stream (e.g., String.chars().parallel()), the additional overhead is a few extra object instantiations, since we just copy the spliterator and unleash whatever parallelism it can offer. (We could consider additionally querying the spliterator, asking if it is splittable, and if not, getting the iterator and doing the "split the iterator" trick done by Iterators.spliterator.) When applied in the middle of a pipeline with stateless intermediate operations, the overhead is also pretty limited; we process the upstream ops serially (but jammed) and apply our "split the iterator" trick. When applied in the middle of a pipeline with stateful operations, we end up with a full serial computation through the last stateful op, and if there are any intervening stateless intermediate ops after that, it reduces to the previous case. I'm starting to think that Streamable is not carrying its weight. I'd like to consider dropping Streamable, at which point the base-most implementation of parallel() is in Collection, and I'd also suggest we consider renaming that to parallelStream(). Where that would leave is us: - Persistent aggregates like Collection have two methods, stream() and parallelStream(); - Aggregate views of other classes are handled by accessor methods like String.codePoints(), and you get parallelism by asking the resulting stream for a .parallel() view. So Collection.parallelStream() becomes unnecessary, but a desirable optimization for a very common case. And the streaminess of the resulting value is clearer. On 12/8/2012 11:58 AM, Brian Goetz wrote: > Following up on the previous note, now that the stream-building APIs > have settled into something fairly simple and regular, I'm not > completely happy with the arrangement of the stream() / parallel() buns. > > For collections, stream() and parallel() seem fine; the user already has > a collection in hand, and can ask for a sequential or parallel stream. > (Separately: I'm starting to prefer stream() / parallelStream() as the > bun names here.) > > But, there are other ways to get a stream: > > String.chars() > Reader.lines() > regex.matches(source) > etc > > It seems pretty natural for these things to return Streams. But, in > accordance with our "no implicit parallelism" dictum, these streams are > serial. But many of these streams can be operated on in parallel -- so > the question is, how would we get a parallel stream out of these? > > One obvious choice is to have two operations for each of these: > > String.chars() > String.charsAsParallelStream() > > That's pretty ugly, and unlikely to be consistently implemented. > > > Now that the Streams construction API and internals have shaken out, > another option has emerged. A Spliterator can be traversed sequentially > or in parallel. Many sequential streams are constructed out of > spliterators that already know how to split (e.g., Arrays.spliterator), > and, we know how to expose some parallelism from otherwise sequential > data sources anyway (see implementation of Iterators.spliterator). Just > because iteration is sequential does not mean there is no exploitable > parallelism. > > > So, here's what I propose. Currently, we have a .sequential() > operation, which is a no-op on sequential streams and on parallel > streams acts as a barrier so that upstream computation can occur in > parallel but downstream computation can occur serially, in encounter > order (if defined), within-thread. We've also got a spliterator() > "escape hatch". > > We can add to these a .parallel() operations, which on parallel streams > is a no-op. The implementation is very simple and efficient (if applied > early on in the pipeline.) > > Here's the default implementation (which is probably good enough for all > cases): > > Stream parallel() { > if (isParallel()) > return this; > else > return Streams.parallel(spliterator(), getStreamFlags()); > } > > What makes this efficient is that if you apply this operation at the > very top of the pipeline, it just grabs the underlying spliterator, > wraps it in a new stream with the parallel flag set, and keeps going. > (If applied farther down the pipeline, spliterator() returns a > spliterator wrapped with the intervening operations.) > > > Bringing this back to our API, this enables us to have a .parallel() > operation on Stream, so users can say: > > string.chars().parallel()... > > if they want to operate on the characters in parallel. > > The default implementation of parallel / parallelStream in Streamable > could then be: > > default Stream parallel() { > return stream().parallel(); > } > > But I think it is still worth keeping the parallel / parallelStream bun > for collections since this is such an important use case (and is still > slightly more efficient; a few fewer object creations.) > From dl at cs.oswego.edu Fri Dec 14 11:54:04 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 14:54:04 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB70A0.3000606@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> <50CB70A0.3000606@oracle.com> Message-ID: <50CB83DC.8050505@cs.oswego.edu> On 12/14/12 13:32, Brian Goetz wrote: > There's always > > map.entrySet().parallel().forEach() I'm guessing you also declared the Map version in because it can take BiBlocks, not Block? But if so, there's no way to get the par/seq distinction. Which might be OK because ... > > But, let's uplevel this. I'm not sure we want to, because we then hit some familiar territory that might be best avoided for JDK8: > - What *other* operations do you want to expose in parallel other than forEach? Well, there are the ones exported in CHM. See pre-JDK8 snapshot still at: http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ConcurrentHashMap.html These make for a nice minimal set for a null-intolerant unordered concurrently-weakly-iterable concurrent map. (Which is by far the most commonly encountered form for parallel bulk map ops. As in hadoop etc etc). But generalizing these to apply to all possible maps leads to a lot of snags. Many more than collections/streams. (As you apparently agreed when scrapping MapStream.) I had been thinking that the best move for JDK8 is for CHM itself to either only support parallel forms (as it does at the moment) or to offer both seq and par forms of these (foreach, mapped-foreach, search, reduce, mapped-reduce, and primitive-mapped-reduce, crossed by the four flavors: keys, values, entries, and Bi-forms), without trying to create new interfaces. This move would give people a chance to take interface/framework issues up again someday. But a near-term victim is that if you want to just add a common plain forEach(BiBlock), we have no good story about how to spec it to allow both seq and par forms, or a place to put it that would implicitly allow both forms. Maybe just scrap it? -Doug From brian.goetz at oracle.com Fri Dec 14 12:01:52 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 15:01:52 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB83DC.8050505@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> <50CB70A0.3000606@oracle.com> <50CB83DC.8050505@cs.oswego.edu> Message-ID: <50CB85B0.70505@oracle.com> >> map.entrySet().parallel().forEach() > > I'm guessing you also declared the Map version in because > it can take BiBlocks, not Block? The main two reasons we flirted with MapStream were the above (want to deal in BiXxx instead of Xxx(Map.Entry)) and also that we wanted to have some finer-grained operations (like, reduce values associated with a given key.) But, reduceBy (and its spiritual descendents) eliminated the latter. Some combinators that take a Bi{Block,Predicate,Function} and produce a {B,P,F}(Map.Entry) could help with the former. For example: map.entrySet().stream() .forEach(forEntry((k,v) -> { ... }); This would trade O(n) "e.getKey()/getValue()" goo for O(1) wrapping goo. >> - What *other* operations do you want to expose in parallel other >> than forEach? > > Well, there are the ones exported in CHM. See pre-JDK8 snapshot still at: > http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ConcurrentHashMap.html Are you willing to have .foo and .parallelFoo versions? > I had been thinking that the best move for JDK8 is for CHM itself > to either only support parallel forms (as it does at the moment) > or to offer both seq and par forms of these > (foreach, mapped-foreach, search, reduce, mapped-reduce, and > primitive-mapped-reduce, crossed by the four flavors: > keys, values, entries, and Bi-forms), without trying to > create new interfaces. Messy (for you) but probably the most option-preserving choice. > But a near-term victim is that if you want to just add a > common plain forEach(BiBlock), we have no good story about > how to spec it to allow both seq and par forms, or a place > to put it that would implicitly allow both forms. > > Maybe just scrap it? What about the stream form above plus a block adapter? From Donald.Raab at gs.com Fri Dec 14 12:02:51 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Fri, 14 Dec 2012 15:02:51 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: <50CB70E5.7080500@oracle.com> References: <50C64BED.2050100@oracle.com> <50CB70E5.7080500@oracle.com> Message-ID: <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> > Open question: do we keep the name mutableReduce to make it clear what > is going on? If you went without the name, I'm not sure how clear it will be with the existing types that one is mutating or non mutating. Would this make sense, or not so much? U reduce(U zero, BiFunction nonMutatingAccumulator, BinaryOperator nonMutatingReducer); R reduce(Supplier seedFactory, BiBlock mutatingAccumulator, BiBlock mutatingReducer); These are APIs we've had in a separate library for several years that we've moved into GS Collections RichIterable interface and which will become available in our 3.0 release. They are a combination of groupBy/injectInto so certainly have some differences to reduce, but they also have some similarities in terms of supporting mutating/non-mutating versions. We chose not to go with the name mutableAggregateBy, but instead named the parameter mutatingAggregator or nonMutatingAggregator. We felt the difference in type made it pretty clear (Function2 vs. Procedure2). /** * Applies an aggregate procedure over the iterable grouping results into a Map based on the specific groupBy function. * Aggregate results are required to be mutable as they will be changed in place by the procedure. A second function * specifies the initial "zero" aggregate value to work with (e.g. new AtomicInteger(0)). * * @since 3.0 */ MapIterable aggregateBy( Function groupBy, Function0 zeroValueFactory, Procedure2 mutatingAggregator); /** * Applies an aggregate function over the iterable grouping results into a map based on the specific groupBy function. * Aggregate results are allowed to be immutable as they will be replaced in place in the map. A second function * specifies the initial "zero" aggregate value to work with (e.g. new Integer(0)). * * @since 3.0 */ MapIterable aggregateBy( Function groupBy, Function0 zeroValueFactory, Function2 nonMutatingAggregator); From brian.goetz at oracle.com Fri Dec 14 12:15:16 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 15:15:16 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> References: <50C64BED.2050100@oracle.com> <50CB70E5.7080500@oracle.com> <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50CB88D4.3040400@oracle.com> >> Open question: do we keep the name mutableReduce to make it clear what >> is going on? > > If you went without the name, I'm not sure how clear it will be with the existing types that one is mutating or non mutating. Would this make sense, or not so much? > > U reduce(U zero, > BiFunction nonMutatingAccumulator, > BinaryOperator nonMutatingReducer); > > R reduce(Supplier seedFactory, > BiBlock mutatingAccumulator, > BiBlock mutatingReducer); There's no reason this doesn't *work*. My concern was that calls of both will involve lambdas of two arguments, and it will be less obvious by reading the code whether (a, b) -> goo is being used as a functional reducer or a mutable accumulator. The parameter names will not be present at the use site, so readers of the code will have to reason about whether this is a functional reduce or a mutable inject. So, the question remains -- is the word "mutable" (or some other way of saying that) a helpful guide about what is being done here, or pedantic noise that will irritate the users? (Note that I think we should care much less how it makes people feel when *writing* code than how it helps comprehension when *reading* code.) > These are APIs we've had in a separate library for several years that we've moved into GS Collections RichIterable interface and which will become available in our 3.0 release. They are a combination of groupBy/injectInto so certainly have some differences to reduce, but they also have some similarities in terms of supporting mutating/non-mutating versions. We chose not to go with the name mutableAggregateBy, but instead named the parameter mutatingAggregator or nonMutatingAggregator. We felt the difference in type made it pretty clear (Function2 vs. Procedure2). The type does make it clear when the call site has "new Function() { ... }". Which of course described many call sites in a pre-lambda world. In a post-lambda world, the type is inferred, and both functions and blocks often look similar at the use site: (a, b) -> a+b (a, b) -> a.add(b) // where add(b) is void-bearing and mutative From tim at peierls.net Fri Dec 14 13:03:10 2012 From: tim at peierls.net (Tim Peierls) Date: Fri, 14 Dec 2012 16:03:10 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: <50CB88D4.3040400@oracle.com> References: <50C64BED.2050100@oracle.com> <50CB70E5.7080500@oracle.com> <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> <50CB88D4.3040400@oracle.com> Message-ID: On Fri, Dec 14, 2012 at 3:15 PM, Brian Goetz wrote: > So, the question remains -- is the word "mutable" (or some other way of >>> saying that) a helpful guide about what is being done here, or pedantic >>> noise that will irritate the users? (Note that I think we should care much >>> less how it makes people feel when *writing* code than how it helps >>> comprehension when *reading* code.) >> >> I don't think it's noise. It's going to be less common to come across "mutable" forms in practice, since they'll be harder to work with and will typically have to be wrapped in something more friendly. (That's my prediction, anyway.) So I think long descriptive names are fine here, like reduceThroughSideEffects or reduceHereBeDragons. --tim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121214/1a4e9d52/attachment.html From brian.goetz at oracle.com Fri Dec 14 13:09:02 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 16:09:02 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: References: <50C64BED.2050100@oracle.com> <50CB70E5.7080500@oracle.com> <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> <50CB88D4.3040400@oracle.com> Message-ID: <50CB956E.6010508@oracle.com> So, right now most inject-style use cases will be mutable, since the containers we have (ArrayList, StringBuilder) are mutable. Over time, we will likely see more immutable data structures, so maybe over time the problem solves itself. On 12/14/2012 4:03 PM, Tim Peierls wrote: > On Fri, Dec 14, 2012 at 3:15 PM, Brian Goetz > wrote: > > So, the question remains -- is the word "mutable" (or some > other way of saying that) a helpful guide about what is > being done here, or pedantic noise that will irritate the > users? (Note that I think we should care much less how it > makes people feel when *writing* code than how it helps > comprehension when *reading* code.) > > > I don't think it's noise. It's going to be less common to come across > "mutable" forms in practice, since they'll be harder to work with > and will typically have to be wrapped in something more friendly. > (That's my prediction, anyway.) So I think long descriptive names are > fine here, like reduceThroughSideEffects or reduceHereBeDragons. > > --tim From tim at peierls.net Fri Dec 14 13:15:43 2012 From: tim at peierls.net (Tim Peierls) Date: Fri, 14 Dec 2012 16:15:43 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: <50CB956E.6010508@oracle.com> References: <50C64BED.2050100@oracle.com> <50CB70E5.7080500@oracle.com> <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> <50CB88D4.3040400@oracle.com> <50CB956E.6010508@oracle.com> Message-ID: That was an example of the use of "so" to mean "no", wasn't it? :-) So really grody names are out. Is there something better than reduceMutably? On Fri, Dec 14, 2012 at 4:09 PM, Brian Goetz wrote: > So, right now most inject-style use cases will be mutable, since the > containers we have (ArrayList, StringBuilder) are mutable. Over time, we > will likely see more immutable data structures, so maybe over time the > problem solves itself. > > > On 12/14/2012 4:03 PM, Tim Peierls wrote: > >> On Fri, Dec 14, 2012 at 3:15 PM, Brian Goetz > > wrote: >> >> So, the question remains -- is the word "mutable" (or some >> other way of saying that) a helpful guide about what is >> being done here, or pedantic noise that will irritate the >> users? (Note that I think we should care much less how it >> makes people feel when *writing* code than how it helps >> comprehension when *reading* code.) >> >> >> I don't think it's noise. It's going to be less common to come across >> "mutable" forms in practice, since they'll be harder to work with >> and will typically have to be wrapped in something more friendly. >> (That's my prediction, anyway.) So I think long descriptive names are >> fine here, like reduceThroughSideEffects or reduceHereBeDragons. >> >> --tim >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121214/251b0163/attachment.html From dl at cs.oswego.edu Fri Dec 14 13:23:48 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 16:23:48 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: References: <50C64BED.2050100@oracle.com> <50CB70E5.7080500@oracle.com> <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> <50CB88D4.3040400@oracle.com> <50CB956E.6010508@oracle.com> Message-ID: <50CB98E4.4030105@cs.oswego.edu> On 12/14/12 16:15, Tim Peierls wrote: > That was an example of the use of "so" to mean "no", wasn't it? :-) > > So really grody names are out. Is there something better than reduceMutably? This is exactly the same case as with CompletableFutures, where having actiony completions and functiony continuations are about equally common, so just overloading seemed fine to everyone? -Doug > > > On Fri, Dec 14, 2012 at 4:09 PM, Brian Goetz > wrote: > > So, right now most inject-style use cases will be mutable, since the > containers we have (ArrayList, StringBuilder) are mutable. Over time, we > will likely see more immutable data structures, so maybe over time the > problem solves itself. > > > On 12/14/2012 4:03 PM, Tim Peierls wrote: > > On Fri, Dec 14, 2012 at 3:15 PM, Brian Goetz > __>> wrote: > > So, the question remains -- is the word "mutable" (or some > other way of saying that) a helpful guide about what is > being done here, or pedantic noise that will irritate the > users? (Note that I think we should care much less how it > makes people feel when *writing* code than how it helps > comprehension when *reading* code.) > > > I don't think it's noise. It's going to be less common to come across > "mutable" forms in practice, since they'll be harder to work with > and will typically have to be wrapped in something more friendly. > (That's my prediction, anyway.) So I think long descriptive names are > fine here, like reduceThroughSideEffects or reduceHereBeDragons. > > --tim > > From brian.goetz at oracle.com Fri Dec 14 13:24:20 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 16:24:20 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: References: <50C64BED.2050100@oracle.com> <50CB70E5.7080500@oracle.com> <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> <50CB88D4.3040400@oracle.com> <50CB956E.6010508@oracle.com> Message-ID: <50CB9904.9080906@oracle.com> I thought was using "so" to mean "I agree with your conclusion, but not your assumption, but here's an alternate reasoning that supports your conclusion anyway" :) I think "mutableReduce" is a fair balance between a small number of extra characters and some extra meaning. On 12/14/2012 4:15 PM, Tim Peierls wrote: > That was an example of the use of "so" to mean "no", wasn't it? :-) > > So really grody names are out. Is there something better than reduceMutably? > > > On Fri, Dec 14, 2012 at 4:09 PM, Brian Goetz > wrote: > > So, right now most inject-style use cases will be mutable, since the > containers we have (ArrayList, StringBuilder) are mutable. Over > time, we will likely see more immutable data structures, so maybe > over time the problem solves itself. > > > On 12/14/2012 4:03 PM, Tim Peierls wrote: > > On Fri, Dec 14, 2012 at 3:15 PM, Brian Goetz > > __>> wrote: > > So, the question remains -- is the word "mutable" > (or some > other way of saying that) a helpful guide about what is > being done here, or pedantic noise that will > irritate the > users? (Note that I think we should care much less > how it > makes people feel when *writing* code than how it helps > comprehension when *reading* code.) > > > I don't think it's noise. It's going to be less common to come > across > "mutable" forms in practice, since they'll be harder to work with > and will typically have to be wrapped in something more friendly. > (That's my prediction, anyway.) So I think long descriptive > names are > fine here, like reduceThroughSideEffects or reduceHereBeDragons. > > --tim > > From tim at peierls.net Fri Dec 14 14:03:59 2012 From: tim at peierls.net (Tim Peierls) Date: Fri, 14 Dec 2012 17:03:59 -0500 Subject: Forms for reduce() -- part 1 In-Reply-To: References: <50C64BED.2050100@oracle.com> <50CB70E5.7080500@oracle.com> <6712820CB52CFB4D842561213A77C05404BE2037D4@GSCMAMP09EX.firmwide.corp.gs.com> <50CB88D4.3040400@oracle.com> <50CB956E.6010508@oracle.com> <50CB9904.9080906@oracle.com> Message-ID: [Oops, failed to Reply-All] On Fri, Dec 14, 2012 at 4:24 PM, Brian Goetz wrote: > I thought was using "so" to mean "I agree with your conclusion, but not > your assumption, but here's an alternate reasoning that supports your > conclusion anyway" :) > That's nearly exactly how I read it! > I think "mutableReduce" is a fair balance between a small number of extra > characters and some extra meaning. > It's nice if they sort near each other in javadocs, so "reduce" is better than "Reduce". --tim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121214/42f40431/attachment.html From forax at univ-mlv.fr Fri Dec 14 14:47:49 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 14 Dec 2012 23:47:49 +0100 Subject: Constructing parallel streams In-Reply-To: <50CB7CAD.1000405@oracle.com> References: <50C371C4.6000905@oracle.com> <50CB7CAD.1000405@oracle.com> Message-ID: <50CBAC95.2010204@univ-mlv.fr> A quick question, why isParallel() is public, it seems it's an implementation details ? R?mi On 12/14/2012 08:23 PM, Brian Goetz wrote: > I've added .parallel() to Stream. The implementation is trivial: > > public Stream parallel() { > if (isParallel()) > return this; > else { > return Streams.parallel(spliterator(), getStreamFlags()); > } > } > > When applied at the head of a stream (e.g., > String.chars().parallel()), the additional overhead is a few extra > object instantiations, since we just copy the spliterator and unleash > whatever parallelism it can offer. (We could consider additionally > querying the spliterator, asking if it is splittable, and if not, > getting the iterator and doing the "split the iterator" trick done by > Iterators.spliterator.) > > When applied in the middle of a pipeline with stateless intermediate > operations, the overhead is also pretty limited; we process the > upstream ops serially (but jammed) and apply our "split the iterator" > trick. > > When applied in the middle of a pipeline with stateful operations, we > end up with a full serial computation through the last stateful op, > and if there are any intervening stateless intermediate ops after > that, it reduces to the previous case. > > > I'm starting to think that Streamable is not carrying its weight. I'd > like to consider dropping Streamable, at which point the base-most > implementation of parallel() is in Collection, and I'd also suggest we > consider renaming that to parallelStream(). > > Where that would leave is us: > - Persistent aggregates like Collection have two methods, stream() > and parallelStream(); > - Aggregate views of other classes are handled by accessor methods > like String.codePoints(), and you get parallelism by asking the > resulting stream for a .parallel() view. > > So Collection.parallelStream() becomes unnecessary, but a desirable > optimization for a very common case. And the streaminess of the > resulting value is clearer. > > On 12/8/2012 11:58 AM, Brian Goetz wrote: >> Following up on the previous note, now that the stream-building APIs >> have settled into something fairly simple and regular, I'm not >> completely happy with the arrangement of the stream() / parallel() buns. >> >> For collections, stream() and parallel() seem fine; the user already has >> a collection in hand, and can ask for a sequential or parallel stream. >> (Separately: I'm starting to prefer stream() / parallelStream() as the >> bun names here.) >> >> But, there are other ways to get a stream: >> >> String.chars() >> Reader.lines() >> regex.matches(source) >> etc >> >> It seems pretty natural for these things to return Streams. But, in >> accordance with our "no implicit parallelism" dictum, these streams are >> serial. But many of these streams can be operated on in parallel -- so >> the question is, how would we get a parallel stream out of these? >> >> One obvious choice is to have two operations for each of these: >> >> String.chars() >> String.charsAsParallelStream() >> >> That's pretty ugly, and unlikely to be consistently implemented. >> >> >> Now that the Streams construction API and internals have shaken out, >> another option has emerged. A Spliterator can be traversed sequentially >> or in parallel. Many sequential streams are constructed out of >> spliterators that already know how to split (e.g., Arrays.spliterator), >> and, we know how to expose some parallelism from otherwise sequential >> data sources anyway (see implementation of Iterators.spliterator). Just >> because iteration is sequential does not mean there is no exploitable >> parallelism. >> >> >> So, here's what I propose. Currently, we have a .sequential() >> operation, which is a no-op on sequential streams and on parallel >> streams acts as a barrier so that upstream computation can occur in >> parallel but downstream computation can occur serially, in encounter >> order (if defined), within-thread. We've also got a spliterator() >> "escape hatch". >> >> We can add to these a .parallel() operations, which on parallel streams >> is a no-op. The implementation is very simple and efficient (if applied >> early on in the pipeline.) >> >> Here's the default implementation (which is probably good enough for all >> cases): >> >> Stream parallel() { >> if (isParallel()) >> return this; >> else >> return Streams.parallel(spliterator(), getStreamFlags()); >> } >> >> What makes this efficient is that if you apply this operation at the >> very top of the pipeline, it just grabs the underlying spliterator, >> wraps it in a new stream with the parallel flag set, and keeps going. >> (If applied farther down the pipeline, spliterator() returns a >> spliterator wrapped with the intervening operations.) >> >> >> Bringing this back to our API, this enables us to have a .parallel() >> operation on Stream, so users can say: >> >> string.chars().parallel()... >> >> if they want to operate on the characters in parallel. >> >> The default implementation of parallel / parallelStream in Streamable >> could then be: >> >> default Stream parallel() { >> return stream().parallel(); >> } >> >> But I think it is still worth keeping the parallel / parallelStream bun >> for collections since this is such an important use case (and is still >> slightly more efficient; a few fewer object creations.) >> From brian.goetz at oracle.com Fri Dec 14 15:13:01 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 18:13:01 -0500 Subject: Constructing parallel streams In-Reply-To: <50CBAC95.2010204@univ-mlv.fr> References: <50C371C4.6000905@oracle.com> <50CB7CAD.1000405@oracle.com> <50CBAC95.2010204@univ-mlv.fr> Message-ID: <50CBB27D.7050909@oracle.com> Now that parallel/sequential are no-ops on streams of the correct orientation, we might be able to make it private. Before it was needed for implementations of addAll(Stream). But we can simplify that now. On 12/14/2012 5:47 PM, Remi Forax wrote: > A quick question, why isParallel() is public, it seems it's an > implementation details ? > > R?mi > > On 12/14/2012 08:23 PM, Brian Goetz wrote: >> I've added .parallel() to Stream. The implementation is trivial: >> >> public Stream parallel() { >> if (isParallel()) >> return this; >> else { >> return Streams.parallel(spliterator(), getStreamFlags()); >> } >> } >> >> When applied at the head of a stream (e.g., >> String.chars().parallel()), the additional overhead is a few extra >> object instantiations, since we just copy the spliterator and unleash >> whatever parallelism it can offer. (We could consider additionally >> querying the spliterator, asking if it is splittable, and if not, >> getting the iterator and doing the "split the iterator" trick done by >> Iterators.spliterator.) >> >> When applied in the middle of a pipeline with stateless intermediate >> operations, the overhead is also pretty limited; we process the >> upstream ops serially (but jammed) and apply our "split the iterator" >> trick. >> >> When applied in the middle of a pipeline with stateful operations, we >> end up with a full serial computation through the last stateful op, >> and if there are any intervening stateless intermediate ops after >> that, it reduces to the previous case. >> >> >> I'm starting to think that Streamable is not carrying its weight. I'd >> like to consider dropping Streamable, at which point the base-most >> implementation of parallel() is in Collection, and I'd also suggest we >> consider renaming that to parallelStream(). >> >> Where that would leave is us: >> - Persistent aggregates like Collection have two methods, stream() >> and parallelStream(); >> - Aggregate views of other classes are handled by accessor methods >> like String.codePoints(), and you get parallelism by asking the >> resulting stream for a .parallel() view. >> >> So Collection.parallelStream() becomes unnecessary, but a desirable >> optimization for a very common case. And the streaminess of the >> resulting value is clearer. >> >> On 12/8/2012 11:58 AM, Brian Goetz wrote: >>> Following up on the previous note, now that the stream-building APIs >>> have settled into something fairly simple and regular, I'm not >>> completely happy with the arrangement of the stream() / parallel() buns. >>> >>> For collections, stream() and parallel() seem fine; the user already has >>> a collection in hand, and can ask for a sequential or parallel stream. >>> (Separately: I'm starting to prefer stream() / parallelStream() as the >>> bun names here.) >>> >>> But, there are other ways to get a stream: >>> >>> String.chars() >>> Reader.lines() >>> regex.matches(source) >>> etc >>> >>> It seems pretty natural for these things to return Streams. But, in >>> accordance with our "no implicit parallelism" dictum, these streams are >>> serial. But many of these streams can be operated on in parallel -- so >>> the question is, how would we get a parallel stream out of these? >>> >>> One obvious choice is to have two operations for each of these: >>> >>> String.chars() >>> String.charsAsParallelStream() >>> >>> That's pretty ugly, and unlikely to be consistently implemented. >>> >>> >>> Now that the Streams construction API and internals have shaken out, >>> another option has emerged. A Spliterator can be traversed sequentially >>> or in parallel. Many sequential streams are constructed out of >>> spliterators that already know how to split (e.g., Arrays.spliterator), >>> and, we know how to expose some parallelism from otherwise sequential >>> data sources anyway (see implementation of Iterators.spliterator). Just >>> because iteration is sequential does not mean there is no exploitable >>> parallelism. >>> >>> >>> So, here's what I propose. Currently, we have a .sequential() >>> operation, which is a no-op on sequential streams and on parallel >>> streams acts as a barrier so that upstream computation can occur in >>> parallel but downstream computation can occur serially, in encounter >>> order (if defined), within-thread. We've also got a spliterator() >>> "escape hatch". >>> >>> We can add to these a .parallel() operations, which on parallel streams >>> is a no-op. The implementation is very simple and efficient (if applied >>> early on in the pipeline.) >>> >>> Here's the default implementation (which is probably good enough for all >>> cases): >>> >>> Stream parallel() { >>> if (isParallel()) >>> return this; >>> else >>> return Streams.parallel(spliterator(), getStreamFlags()); >>> } >>> >>> What makes this efficient is that if you apply this operation at the >>> very top of the pipeline, it just grabs the underlying spliterator, >>> wraps it in a new stream with the parallel flag set, and keeps going. >>> (If applied farther down the pipeline, spliterator() returns a >>> spliterator wrapped with the intervening operations.) >>> >>> >>> Bringing this back to our API, this enables us to have a .parallel() >>> operation on Stream, so users can say: >>> >>> string.chars().parallel()... >>> >>> if they want to operate on the characters in parallel. >>> >>> The default implementation of parallel / parallelStream in Streamable >>> could then be: >>> >>> default Stream parallel() { >>> return stream().parallel(); >>> } >>> >>> But I think it is still worth keeping the parallel / parallelStream bun >>> for collections since this is such an important use case (and is still >>> slightly more efficient; a few fewer object creations.) >>> > From dl at cs.oswego.edu Fri Dec 14 16:39:55 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 19:39:55 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CB85B0.70505@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> <50CB70A0.3000606@oracle.com> <50CB83DC.8050505@cs.oswego.edu> <50CB85B0.70505@oracle.com> Message-ID: <50CBC6DB.509@cs.oswego.edu> On 12/14/12 15:01, Brian Goetz wrote: > Some combinators that take a Bi{Block,Predicate,Function} and produce a > {B,P,F}(Map.Entry) could help with the former. For example: > > map.entrySet().stream() > .forEach(forEntry((k,v) -> { ... }); > > This would trade O(n) "e.getKey()/getValue()" goo for O(1) wrapping goo. Although considering that many maps generate per-iteration Entry objects, this amounts to double-wrapping. I suppose the convenience outweighs the overhead for the audience most tempted to use it though. > >> I had been thinking that the best move for JDK8 is for CHM itself >> to either only support parallel forms (as it does at the moment) >> or to offer both seq and par forms of these >> (foreach, mapped-foreach, search, reduce, mapped-reduce, and >> primitive-mapped-reduce, crossed by the four flavors: >> keys, values, entries, and Bi-forms), without trying to >> create new interfaces. > > Messy (for you) but probably the most option-preserving choice. > Any preferences about which form? I omitted seq forms in current version so as to release the par ones without stalling over whether they'd be segregated under something sharing common interface. But now that this option seems out (no MapStreams) I'm left with the usual choices of how to name and/or parameterize them. Probably: forEach{Key,Value,Entry,}Sequentially / forEach{...}InParallel reduce{...}Sequentially / reduce{...}InParallel search{...}Sequentially / search{...}InParallel Luckily there are only three method name stems, all short enough that this is tolerable. Plus, the explicit "Sequentially"/"InParallel" leave avoid clashes with several pluasible future options. All of these would be in the CHM class itself. The CHM key, value, entry view classes as well as the CHM.newKeySet set projection will instead implement stream API (which I hope to do next.) OK? -Doug From brian.goetz at oracle.com Fri Dec 14 16:47:34 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 14 Dec 2012 19:47:34 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CBC6DB.509@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> <50CB70A0.3000606@oracle.com> <50CB83DC.8050505@cs.oswego.edu> <50CB85B0.70505@oracle.com> <50CBC6DB.509@cs.oswego.edu> Message-ID: <50CBC8A6.4080409@oracle.com> >> Messy (for you) but probably the most option-preserving choice. > > Any preferences about which form? I omitted seq forms in > current version so as to release the par ones without > stalling over whether they'd be segregated under something > sharing common interface. But now that this option seems out > (no MapStreams) I'm left with the usual choices of how > to name and/or parameterize them. Probably: > forEach{Key,Value,Entry,}Sequentially / > forEach{...}InParallel > reduce{...}Sequentially / reduce{...}InParallel > search{...}Sequentially / search{...}InParallel Seems reasonable. I'd suggest leaving off Sequentially but you'll just ignore that :) From dl at cs.oswego.edu Fri Dec 14 16:58:53 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Dec 2012 19:58:53 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CBC8A6.4080409@oracle.com> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> <50CB70A0.3000606@oracle.com> <50CB83DC.8050505@cs.oswego.edu> <50CB85B0.70505@oracle.com> <50CBC6DB.509@cs.oswego.edu> <50CBC8A6.4080409@oracle.com> Message-ID: <50CBCB4D.7030209@cs.oswego.edu> On 12/14/12 19:47, Brian Goetz wrote: >> (no MapStreams) I'm left with the usual choices of how >> to name and/or parameterize them. Probably: >> forEach{Key,Value,Entry,}Sequentially / >> forEach{...}InParallel >> reduce{...}Sequentially / reduce{...}InParallel >> search{...}Sequentially / search{...}InParallel > > Seems reasonable. I'd suggest leaving off Sequentially but you'll just ignore > that :) > The main reason is caution wrt future APIs by not using the prime real estate of plain "forEach". And it seems OK to do this here in CHM, since it is designed mainly for concurrency+parallelism anyway, so making the choice very explicit is more defensible than elsewhere. -Doug From david.holmes at oracle.com Fri Dec 14 22:50:24 2012 From: david.holmes at oracle.com (David Holmes) Date: Sat, 15 Dec 2012 16:50:24 +1000 Subject: The implementation of default methods In-Reply-To: <50CB57A1.7060707@cs.oswego.edu> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> Message-ID: <50CC1DB0.1040806@oracle.com> On 15/12/2012 2:45 AM, Doug Lea wrote: > On 12/14/12 11:30, Brian Goetz wrote: > >> 1. Document "Implementation note: The default implementation >> currently..." > > As always, the fewer of these the better. In j.u/j.u.c, these > are used mostly for resource limitations (like max threads in FJP) > that might someday be lifted. > >> >> 2. Document "The default implementation behaves as if..." (Or whatever >> Doug's >> proposed wording is.) > > In j.u.c, we always say "is behaviorally equivalent to" but I dropped > the "behaviorally" in Map candidate because someone once told me > it was overly pedantic :-) > >> >> 3. Document "The default implementation MUST" > > Isn't this just the normal spec part, that should precede the default > implementation part? I think not. The "normal spec" describes the abstract operation. "The default implementation MUST" specifies the concrete implementation. But it sounds like we do not intend to lock in what these default implementations do, so, for example, my version of j.u.Iterator.remove doesn't have to throw UnsupportedOperationException if I have some magic way of providing a default remove operation - correct? David > -Doug > From dl at cs.oswego.edu Sat Dec 15 03:42:30 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 15 Dec 2012 06:42:30 -0500 Subject: The implementation of default methods In-Reply-To: <50CC1DB0.1040806@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> Message-ID: <50CC6226.60208@cs.oswego.edu> On 12/15/12 01:50, David Holmes wrote: >>> >>> 2. Document "The default implementation behaves as if..." (Or whatever >>> Doug's >>> proposed wording is.) >> >> In j.u.c, we always say "is behaviorally equivalent to" but I dropped >> the "behaviorally" in Map candidate because someone once told me >> it was overly pedantic :-) >> >>> >>> 3. Document "The default implementation MUST" >> >> Isn't this just the normal spec part, that should precede the default >> implementation part? > > I think not. The "normal spec" describes the abstract operation. "The default > implementation MUST" specifies the concrete implementation. Sorry, I don't get it. If you say what the method requires and then say what default implementation is behaviorally equivalent to in terms of other methods (or imported functionality), you should in principle be done. The equivalence-based wording is critical though, and requires some hard work (and a little judgement) to get right. AbstractCollection (and other AbstractX's in j.u) specs include some now-regrettable wording saying exactly what they do rather than what they are equivalent to, which has prevented some improvements over the years. While I'm at it: It is currently a nuisance to get javadocs right when you override a method defaulted in AbstractCollection. Usually, no combination of @inheritDoc's will save you from copy/paste/hack to edit out the default implementation description while keeping the main spec. This will probably happen a lot when using defaulted implementations. -Doug From dl at cs.oswego.edu Sat Dec 15 04:01:29 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 15 Dec 2012 07:01:29 -0500 Subject: The implementation of default methods In-Reply-To: <50CC1DB0.1040806@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> Message-ID: <50CC6699.2000401@cs.oswego.edu> On 12/15/12 01:50, David Holmes wrote: >>> 2. Document "The default implementation behaves as if..." (Or whatever >>> Doug's >>> proposed wording is.) >> >> In j.u.c, we always say "is behaviorally equivalent to" but I dropped >> the "behaviorally" in Map candidate because someone once told me >> it was overly pedantic :-) >> Little style notes: 1. These seem to be more easily decodable when written in "third person" user-centered form. As in, for Map.putIfAbsent: /** * If the specified key is not already associated with a value, * associates it with the given value. The default implementation * is equivalent to, for this {@code map}: * *

 {@code
      * if (!map.containsKey(key))
      *   return map.put(key, value);
      * else
      *   return map.get(key);}
... 2. Often, little code snippets like this are the simplest way to say what you mean. But there's no strict need for this. Sometimes words describing effects are simpler. Or mixtures. I did this among other places in Map.merge to avoid the messiness of putting a retry loop in the code snippet. -Doug From forax at univ-mlv.fr Sat Dec 15 04:24:15 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 15 Dec 2012 13:24:15 +0100 Subject: The implementation of default methods In-Reply-To: <50CC6226.60208@cs.oswego.edu> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC6226.60208@cs.oswego.edu> Message-ID: <50CC6BEF.2000007@univ-mlv.fr> On 12/15/2012 12:42 PM, Doug Lea wrote: > On 12/15/12 01:50, David Holmes wrote: > >>>> >>>> 2. Document "The default implementation behaves as if..." (Or whatever >>>> Doug's >>>> proposed wording is.) >>> >>> In j.u.c, we always say "is behaviorally equivalent to" but I dropped >>> the "behaviorally" in Map candidate because someone once told me >>> it was overly pedantic :-) >>> >>>> >>>> 3. Document "The default implementation MUST" >>> >>> Isn't this just the normal spec part, that should precede the default >>> implementation part? >> >> I think not. The "normal spec" describes the abstract operation. "The >> default >> implementation MUST" specifies the concrete implementation. > > Sorry, I don't get it. If you say what the method requires and > then say what default implementation is behaviorally equivalent to > in terms of other methods (or imported functionality), > you should in principle be done. > > The equivalence-based wording is critical though, and requires some > hard work (and a little judgement) to get right. AbstractCollection > (and other AbstractX's in j.u) specs include some now-regrettable > wording saying exactly what they do rather than what they are equivalent > to, which has prevented some improvements over the years. > > While I'm at it: It is currently a nuisance to get > javadocs right when you override a method defaulted in > AbstractCollection. Usually, no combination of @inheritDoc's > will save you from copy/paste/hack to edit out the default > implementation description while keeping the main spec. > This will probably happen a lot when using defaulted > implementations. that a good reason to have a tag @implementation then you can write /** * @inheritDoc * @implementation implementation specific doc ... */ > > -Doug > > R?mi From brian.goetz at oracle.com Sat Dec 15 07:56:38 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 15 Dec 2012 10:56:38 -0500 Subject: The implementation of default methods In-Reply-To: <50CC1DB0.1040806@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> Message-ID: <50CC9DB6.5090802@oracle.com> For Iterator.remove, I think the real constraint is: the JDK must provide *a* default implementation (since this is only an issue for compatibility across JDKs). Since the only reasonable default would be to throw something, we might as well specify what is thrown, since this degree of freedom serves noone: the JDK must provide a default that throws UOE. On 12/15/2012 1:50 AM, David Holmes wrote: > On 15/12/2012 2:45 AM, Doug Lea wrote: >> On 12/14/12 11:30, Brian Goetz wrote: >> >>> 1. Document "Implementation note: The default implementation >>> currently..." >> >> As always, the fewer of these the better. In j.u/j.u.c, these >> are used mostly for resource limitations (like max threads in FJP) >> that might someday be lifted. >> >>> >>> 2. Document "The default implementation behaves as if..." (Or whatever >>> Doug's >>> proposed wording is.) >> >> In j.u.c, we always say "is behaviorally equivalent to" but I dropped >> the "behaviorally" in Map candidate because someone once told me >> it was overly pedantic :-) >> >>> >>> 3. Document "The default implementation MUST" >> >> Isn't this just the normal spec part, that should precede the default >> implementation part? > > I think not. The "normal spec" describes the abstract operation. "The > default implementation MUST" specifies the concrete implementation. > > But it sounds like we do not intend to lock in what these default > implementations do, so, for example, my version of j.u.Iterator.remove > doesn't have to throw UnsupportedOperationException if I have some magic > way of providing a default remove operation - correct? > > David > >> -Doug >> From brian.goetz at oracle.com Sat Dec 15 08:03:24 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 15 Dec 2012 11:03:24 -0500 Subject: The implementation of default methods In-Reply-To: <50CC6BEF.2000007@univ-mlv.fr> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC6226.60208@cs.oswego.edu> <50CC6BEF.2000007@univ-mlv.fr> Message-ID: <50CC9F4C.9020105@oracle.com> > that a good reason to have a tag @implementation > then you can write I agree that having more structured places to put documentation information is probably helpful, but everyone talks about the @implementation tag as if it were a magic bullet, which it is not. The harder issues are the ones we're thrashing out now, such as separating normative statements about default implementations from non-normative ones. From dl at cs.oswego.edu Sat Dec 15 09:14:46 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 15 Dec 2012 12:14:46 -0500 Subject: The implementation of default methods In-Reply-To: <50CC9DB6.5090802@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC9DB6.5090802@oracle.com> Message-ID: <50CCB006.10805@cs.oswego.edu> On 12/15/12 10:56, Brian Goetz wrote: > For Iterator.remove, I think the real constraint is: the JDK must provide *a* > default implementation (since this is only an issue for compatibility across > JDKs). Since the only reasonable default would be to throw something, we might > as well specify what is thrown, since this degree of freedom serves noone: the > JDK must provide a default that throws UOE. Pedantic mode: In which case the javadoc should say "always throws UOE". To be even more pedantic, it should say "always throws UOE without first doing anything else you should know about, like erasing your disk". But frame-axioms are usually implicit in these kinds of specs. (But not always...) -Doug From brian.goetz at oracle.com Sat Dec 15 09:26:27 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 15 Dec 2012 12:26:27 -0500 Subject: Streamable Message-ID: <50CCB2C3.904@oracle.com> I am considering dropping the Streamable interface. Currently the only implementor is Collection, and all of the other stream-bearing methods are serving up specialized streams (chars(), codePoints(), lines(), etc) with a method name that is more suitable than "stream". So I think we should drop Streamable and leave the stream() / parallel() methods on Collection (or possibly move them up Iterable). Also still leaning towards renaming parallel() to parallelStream(). From dl at cs.oswego.edu Sat Dec 15 09:31:37 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 15 Dec 2012 12:31:37 -0500 Subject: Streamable In-Reply-To: <50CCB2C3.904@oracle.com> References: <50CCB2C3.904@oracle.com> Message-ID: <50CCB3F9.2030600@cs.oswego.edu> On 12/15/12 12:26, Brian Goetz wrote: > I am considering dropping the Streamable interface. Currently the only > implementor is Collection, and all of the other stream-bearing methods are > serving up specialized streams (chars(), codePoints(), lines(), etc) with a > method name that is more suitable than "stream". So I think we should drop > Streamable and leave the stream() / parallel() methods on Collection (or > possibly move them up Iterable). Could care less :-) > > Also still leaning towards renaming parallel() to parallelStream(). > Yes please! -Doug From brian.goetz at oracle.com Sat Dec 15 16:08:42 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 15 Dec 2012 19:08:42 -0500 Subject: toArray Message-ID: <50CD110A.1070509@oracle.com> Seems that the minimally invasive version of toArray (that doesn't propagate the horrible convention established by Collection, and yet doesn't foist Object[] on users) is: interface Stream { Object[] toArray(); T[] toArray(Class clazz); } It is unfortunate to need the Object[] version at all. However, code that is generic in T might be passed a Stream and not know what class literal to use. It is further unfortunate that we cannot say S[] toArray(Class clazz) as then such code could say toArray(Object.class), but we cannot (this is a limitation of generics.) From forax at univ-mlv.fr Sun Dec 16 02:30:04 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 16 Dec 2012 11:30:04 +0100 Subject: toArray In-Reply-To: <50CD110A.1070509@oracle.com> References: <50CD110A.1070509@oracle.com> Message-ID: <50CDA2AC.1020105@univ-mlv.fr> On 12/16/2012 01:08 AM, Brian Goetz wrote: > Seems that the minimally invasive version of toArray (that doesn't > propagate the horrible convention established by Collection, and yet > doesn't foist Object[] on users) is: > > interface Stream { > Object[] toArray(); > T[] toArray(Class clazz); > } > > It is unfortunate to need the Object[] version at all. However, code > that is generic in T might be passed a Stream and not know what > class literal to use. It is further unfortunate that we cannot say > > S[] toArray(Class clazz) > > as then such code could say toArray(Object.class), but we cannot (this > is a limitation of generics.) > why not ? interface Stream { U[] toArray(Class clazz); } R?mi From brian.goetz at oracle.com Sun Dec 16 07:01:09 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 16 Dec 2012 10:01:09 -0500 Subject: toArray In-Reply-To: <50CDA2AC.1020105@univ-mlv.fr> References: <50CD110A.1070509@oracle.com> <50CDA2AC.1020105@univ-mlv.fr> Message-ID: <50CDE235.8040200@oracle.com> This works if you're willing to throw static type safety out the window; we have no compile-time guarantee that U[] is compatible with elements of type T. On 12/16/2012 5:30 AM, Remi Forax wrote: > On 12/16/2012 01:08 AM, Brian Goetz wrote: >> Seems that the minimally invasive version of toArray (that doesn't >> propagate the horrible convention established by Collection, and yet >> doesn't foist Object[] on users) is: >> >> interface Stream { >> Object[] toArray(); >> T[] toArray(Class clazz); >> } >> >> It is unfortunate to need the Object[] version at all. However, code >> that is generic in T might be passed a Stream and not know what >> class literal to use. It is further unfortunate that we cannot say >> >> S[] toArray(Class clazz) >> >> as then such code could say toArray(Object.class), but we cannot (this >> is a limitation of generics.) >> > > why not ? > > interface Stream { > U[] toArray(Class clazz); > } > > R?mi From forax at univ-mlv.fr Sun Dec 16 07:09:37 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 16 Dec 2012 16:09:37 +0100 Subject: toArray In-Reply-To: <50CDE235.8040200@oracle.com> References: <50CD110A.1070509@oracle.com> <50CDA2AC.1020105@univ-mlv.fr> <50CDE235.8040200@oracle.com> Message-ID: <50CDE431.9010703@univ-mlv.fr> On 12/16/2012 04:01 PM, Brian Goetz wrote: > This works if you're willing to throw static type safety out the > window; we have no compile-time guarantee that U[] is compatible with > elements of type T. yes, it's pragmatic choice. Few years back, i realize that a lot of Java developers never seen an ArrayStoreException or have trouble to remember the last time they saw one. R?mi > > > > On 12/16/2012 5:30 AM, Remi Forax wrote: >> On 12/16/2012 01:08 AM, Brian Goetz wrote: >>> Seems that the minimally invasive version of toArray (that doesn't >>> propagate the horrible convention established by Collection, and yet >>> doesn't foist Object[] on users) is: >>> >>> interface Stream { >>> Object[] toArray(); >>> T[] toArray(Class clazz); >>> } >>> >>> It is unfortunate to need the Object[] version at all. However, code >>> that is generic in T might be passed a Stream and not know what >>> class literal to use. It is further unfortunate that we cannot say >>> >>> S[] toArray(Class clazz) >>> >>> as then such code could say toArray(Object.class), but we cannot (this >>> is a limitation of generics.) >>> >> >> why not ? >> >> interface Stream { >> U[] toArray(Class clazz); >> } >> >> R?mi From dl at cs.oswego.edu Sun Dec 16 14:10:29 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 16 Dec 2012 17:10:29 -0500 Subject: apply Message-ID: <50CE46D5.2080307@cs.oswego.edu> (Still in the midst of the painful setup allowing JDK8 j.u.c builds to track lambda-libs...) Could someone please explain clearly and compellingly why we are using different method names for all the functional forms in java.util.function instead of just "apply"? Does anyone think that other users won't find this very annoying? -Doug From forax at univ-mlv.fr Sun Dec 16 14:30:46 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 16 Dec 2012 23:30:46 +0100 Subject: apply In-Reply-To: <50CE46D5.2080307@cs.oswego.edu> References: <50CE46D5.2080307@cs.oswego.edu> Message-ID: <50CE4B96.7020803@univ-mlv.fr> On 12/16/2012 11:10 PM, Doug Lea wrote: > > (Still in the midst of the painful setup allowing > JDK8 j.u.c builds to track lambda-libs...) > > Could someone please explain clearly and compellingly why we are using > different method names for all the functional forms in java.util.function > instead of just "apply"? Problems come when a functional interface inherits from another, because in that case the two methods are considered as overloads and Java has specific rules for overloads like a method with the same parameters can't have different return type. by example, if Supplier use apply, IntSupplier can not inherits from Supplier. interface Supplier { T apply(); } interface IntSupplier extends Supplier { int apply(); } // won't compile and we want InSupplier to inherit from Supplier to avoid function interface to functional interface conversion that currently always creates a new object. > Does anyone think that other users won't find this very annoying? It's annoying for framework writers not users of those frameworks because writing a lambda doesn't require to know the method name. > > -Doug R?mi From dl at cs.oswego.edu Sun Dec 16 14:47:33 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 16 Dec 2012 17:47:33 -0500 Subject: apply In-Reply-To: <50CE4B96.7020803@univ-mlv.fr> References: <50CE46D5.2080307@cs.oswego.edu> <50CE4B96.7020803@univ-mlv.fr> Message-ID: <50CE4F85.6010208@cs.oswego.edu> On 12/16/12 17:30, Remi Forax wrote: >> Could someone please explain clearly and compellingly why we are using >> different method names for all the functional forms in java.util.function >> instead of just "apply"? > > Problems come when a functional interface inherits from another, because in that > case the two methods are considered as overloads and Java has specific rules for > overloads like a method with the same parameters can't have different return type. > by example, if Supplier use apply, IntSupplier can not inherits from Supplier. > > interface Supplier { T apply(); } > interface IntSupplier extends Supplier { int apply(); } // won't compile > and we want InSupplier to inherit from Supplier to avoid function interface to > functional interface conversion that currently always creates a new object. So instead live with a form that always forces you to remember to use the specially named method in the case of known-to-be-primitives? > >> Does anyone think that other users won't find this very annoying? > > It's annoying for framework writers not users of those frameworks because > writing a lambda doesn't require to know the method name. > Where I guess "Framework writers" must mean: 1. anyone using higher-order functions 2. anyone needing to remember to for example use "operateAsDouble" vs the "operate" version in DoubleBinaryOperator to ensure lack of boxing. -Doug From dl at cs.oswego.edu Sun Dec 16 15:00:56 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 16 Dec 2012 18:00:56 -0500 Subject: apply In-Reply-To: <50CE4F85.6010208@cs.oswego.edu> References: <50CE46D5.2080307@cs.oswego.edu> <50CE4B96.7020803@univ-mlv.fr> <50CE4F85.6010208@cs.oswego.edu> Message-ID: <50CE52A8.7010003@cs.oswego.edu> On 12/16/12 17:47, Doug Lea wrote: > Where I guess "Framework writers" must mean: > 1. anyone using higher-order functions > 2. anyone needing to remember to for example use "operateAsDouble" > vs the "operate" version in DoubleBinaryOperator to ensure lack of boxing. > > ... but my question was, why do people need to remember two weird rules instead of one weird rule? Couldn't this just be apply vs applyAsDouble in DoubleBinaryOperator? And similarly for all of them? -Doug From dl at cs.oswego.edu Sun Dec 16 15:39:07 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 16 Dec 2012 18:39:07 -0500 Subject: Binary Conversion functions Message-ID: <50CE5B9B.1090707@cs.oswego.edu> Are there existing interfaces or usage tricks for existing interfaces for what I had placeholded in ConcurrentHashMap for two non-primitive args -> primitive result: /** Interface describing a function mapping two arguments to a double */ public interface ObjectByObjectToDouble { double apply(A a, B b); } /** Interface describing a function mapping two arguments to a long */ public interface ObjectByObjectToLong { long apply(A a, B b); } /** Interface describing a function mapping two arguments to an int */ public interface ObjectByObjectToInt {int apply(A a, B b); } If not, and no one thinks they belong in j.u.functions, I'll just keep them as local interfaces in CHM. -Doug From brian.goetz at oracle.com Sun Dec 16 16:42:25 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 16 Dec 2012 19:42:25 -0500 Subject: Binary Conversion functions In-Reply-To: <50CE5B9B.1090707@cs.oswego.edu> References: <50CE5B9B.1090707@cs.oswego.edu> Message-ID: <50CE6A71.5050308@oracle.com> These would be described as {Int,Double,Long}BiFunction. We don't have them but I'm open to adding them. On 12/16/2012 6:39 PM, Doug Lea wrote: > > Are there existing interfaces or usage tricks for existing > interfaces for what I had placeholded in ConcurrentHashMap > for two non-primitive args -> primitive result: > > /** Interface describing a function mapping two arguments to a > double */ > public interface ObjectByObjectToDouble { double apply(A a, B > b); } > /** Interface describing a function mapping two arguments to a long */ > public interface ObjectByObjectToLong { long apply(A a, B b); } > /** Interface describing a function mapping two arguments to an int */ > public interface ObjectByObjectToInt {int apply(A a, B b); } > > If not, and no one thinks they belong in j.u.functions, > I'll just keep them as local interfaces in CHM. > > -Doug From brian.goetz at oracle.com Sun Dec 16 18:07:43 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 16 Dec 2012 21:07:43 -0500 Subject: apply In-Reply-To: <50CE46D5.2080307@cs.oswego.edu> References: <50CE46D5.2080307@cs.oswego.edu> Message-ID: <50CE7E6F.9040300@oracle.com> OK, let's try to get this all in one place. Mostly a defense of the status quo but there's some possibly good news at the end for apply fans. 1. Functional interfaces are special interfaces, but they are interfaces. They can be implemented with lambdas, but they can also be implemented by anonymous classes or implemented by classes with other methods. Choosing sensible method names is a key part of API design; there's a reason why we called the method in Runnable "run" and not "apply". Run is a more descriptive name. One could design a library where all methods are named "Harold", but it would be hard to write and hard to use. And there are likely to be functional interfaces where "apply" is an outright terrible name for what the thing does, and the "all same" rule will feel pretty constraining. 2. Choosing nominal function types was a compromise -- a big one. Structural function types are better than nominal ones in almost every way. (Reminder: we went this way because erased and always-boxed structural function types would *really* suck.) One of the few advantages of nominal function types is that they have a name. Naming them all Harold seems like deciding that since we chose a route with disadvantages, we don't deserve any of the few remaining advantages. 3. Naming all SAM methods Harold slams the door on implementing many combinations of SAMs in a single class, or having certain SAMs extend each other. Again, in a pure functional world, the notion of implementing multiple function types doesn't make much sense. But, functional interfaces are interfaces, and objects may very well want to implement multiple functional interfaces. Naming them all Harold means that some combinations can be implemented this way, but some not, which is a half-here, half-there state of affairs. An obvious example is: interface SupplierOfBoxedIntegers { Integer apply(); } interface DispenserOfInts { int apply(); } (First of all, apply is a terrible name for this class.) But more importantly, no class can implement both, even though it might be entirely reasonable for a class IntFlinger to implement all manner of "give me an integer" methods, to maximize compatibility with / minimize needed adaptation for multiple libraries that use a dispenser of integers. If they are all called Harold, IntFlinger is out of luck. In this example, the two functions that IntFlinger wants to implements are basically the same function modulo boxing. But, sometimes you may want to implement mutiple SAMs because their semantics are coupled through the semantics of your class. For example: class MapMembershipArbiter { Map m; implements Predicate, Function { boolean test(K k) { return m.containsKey(k); } V apply(K k) { return m.get(k); } If both methods were named apply, you couldn't do this. Now, you might say "that's a stupid example" (and you might be right!) But, the "all different" rule allows for the possibility that this might actually be reasonable in some configuration; the "all same" rule ensures that this can never happen. That said, I am not unsympathetic (well, unless you don't use an IDE, in which case I'm completely unsympathetic.) I do find myself tripping over UnaryOperator.operate vs Function.apply since they're both just so functiony. And here's where there might be some good news for you. Since we currently have interface UnaryOperator extends Function then it actually is quite reasonable for UnaryOperator to adopt the name from Function, since there is no way to implement UnaryOperator without implementing Function. In which case some of the offenders -- Unary/BinaryOperator -- can go away. Similarly, SAMs of different arities can safely use the same name. So Function and BiFunction both use the same name. On 12/16/2012 5:10 PM, Doug Lea wrote: > > (Still in the midst of the painful setup allowing > JDK8 j.u.c builds to track lambda-libs...) > > Could someone please explain clearly and compellingly why we are using > different method names for all the functional forms in java.util.function > instead of just "apply"? > Does anyone think that other users won't find this very annoying? > > -Doug From david.holmes at oracle.com Sun Dec 16 18:39:24 2012 From: david.holmes at oracle.com (David Holmes) Date: Mon, 17 Dec 2012 12:39:24 +1000 Subject: The implementation of default methods In-Reply-To: <50CC9DB6.5090802@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC9DB6.5090802@oracle.com> Message-ID: <50CE85DC.6020408@oracle.com> On 16/12/2012 1:56 AM, Brian Goetz wrote: > For Iterator.remove, I think the real constraint is: the JDK must > provide *a* default implementation (since this is only an issue for > compatibility across JDKs). Since the only reasonable default would be > to throw something, we might as well specify what is thrown, since this > degree of freedom serves noone: the JDK must provide a default that > throws UOE. One example does not a policy make. We have to address the general issue not, I hope, do a case-by-case analysis to see if we think any other default implementation is possible or reasonable. Put another way do we say: "This default implementation does ..." or "The default implementation does ..." ? On a related but separate note, the "is equivalent to" approach has caused not insignificant confusion over the years because no one knows exactly what it means and then we get bug reports because it's only equivalent in the base class, not in any subclasses (eg we call a private method that 'is equivalent to' calling a bunch of public methods but we don't actually call them so overriding in the subclass doesn't have the expected affect). David ----- > On 12/15/2012 1:50 AM, David Holmes wrote: >> On 15/12/2012 2:45 AM, Doug Lea wrote: >>> On 12/14/12 11:30, Brian Goetz wrote: >>> >>>> 1. Document "Implementation note: The default implementation >>>> currently..." >>> >>> As always, the fewer of these the better. In j.u/j.u.c, these >>> are used mostly for resource limitations (like max threads in FJP) >>> that might someday be lifted. >>> >>>> >>>> 2. Document "The default implementation behaves as if..." (Or whatever >>>> Doug's >>>> proposed wording is.) >>> >>> In j.u.c, we always say "is behaviorally equivalent to" but I dropped >>> the "behaviorally" in Map candidate because someone once told me >>> it was overly pedantic :-) >>> >>>> >>>> 3. Document "The default implementation MUST" >>> >>> Isn't this just the normal spec part, that should precede the default >>> implementation part? >> >> I think not. The "normal spec" describes the abstract operation. "The >> default implementation MUST" specifies the concrete implementation. >> >> But it sounds like we do not intend to lock in what these default >> implementations do, so, for example, my version of j.u.Iterator.remove >> doesn't have to throw UnsupportedOperationException if I have some magic >> way of providing a default remove operation - correct? >> >> David >> >>> -Doug >>> From josh at bloch.us Sun Dec 16 20:46:05 2012 From: josh at bloch.us (Joshua Bloch) Date: Sun, 16 Dec 2012 20:46:05 -0800 Subject: The implementation of default methods In-Reply-To: <50CE85DC.6020408@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC9DB6.5090802@oracle.com> <50CE85DC.6020408@oracle.com> Message-ID: Complexity meter buried deep in the red. Josh On Sun, Dec 16, 2012 at 6:39 PM, David Holmes wrote: > On 16/12/2012 1:56 AM, Brian Goetz wrote: > >> For Iterator.remove, I think the real constraint is: the JDK must >> provide *a* default implementation (since this is only an issue for >> compatibility across JDKs). Since the only reasonable default would be >> to throw something, we might as well specify what is thrown, since this >> degree of freedom serves noone: the JDK must provide a default that >> throws UOE. >> > > One example does not a policy make. We have to address the general issue > not, I hope, do a case-by-case analysis to see if we think any other > default implementation is possible or reasonable. Put another way do we say: > > "This default implementation does ..." > > or > > "The default implementation does ..." > > ? > > > On a related but separate note, the "is equivalent to" approach has caused > not insignificant confusion over the years because no one knows exactly > what it means and then we get bug reports because it's only equivalent in > the base class, not in any subclasses (eg we call a private method that 'is > equivalent to' calling a bunch of public methods but we don't actually call > them so overriding in the subclass doesn't have the expected affect). > > David > ----- > > > > On 12/15/2012 1:50 AM, David Holmes wrote: >> >>> On 15/12/2012 2:45 AM, Doug Lea wrote: >>> >>>> On 12/14/12 11:30, Brian Goetz wrote: >>>> >>>> 1. Document "Implementation note: The default implementation >>>>> currently..." >>>>> >>>> >>>> As always, the fewer of these the better. In j.u/j.u.c, these >>>> are used mostly for resource limitations (like max threads in FJP) >>>> that might someday be lifted. >>>> >>>> >>>>> 2. Document "The default implementation behaves as if..." (Or whatever >>>>> Doug's >>>>> proposed wording is.) >>>>> >>>> >>>> In j.u.c, we always say "is behaviorally equivalent to" but I dropped >>>> the "behaviorally" in Map candidate because someone once told me >>>> it was overly pedantic :-) >>>> >>>> >>>>> 3. Document "The default implementation MUST" >>>>> >>>> >>>> Isn't this just the normal spec part, that should precede the default >>>> implementation part? >>>> >>> >>> I think not. The "normal spec" describes the abstract operation. "The >>> default implementation MUST" specifies the concrete implementation. >>> >>> But it sounds like we do not intend to lock in what these default >>> implementations do, so, for example, my version of j.u.Iterator.remove >>> doesn't have to throw UnsupportedOperationException if I have some magic >>> way of providing a default remove operation - correct? >>> >>> David >>> >>> -Doug >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121216/8e61662d/attachment-0001.html From david.holmes at oracle.com Sun Dec 16 20:55:04 2012 From: david.holmes at oracle.com (David Holmes) Date: Mon, 17 Dec 2012 14:55:04 +1000 Subject: The implementation of default methods In-Reply-To: References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC9DB6.5090802@oracle.com> <50CE85DC.6020408@oracle.com> Message-ID: <50CEA5A8.1070304@oracle.com> On 17/12/2012 2:46 PM, Joshua Bloch wrote: > Complexity meter buried deep in the red. Any suggestion on how to move it? David > Josh > > On Sun, Dec 16, 2012 at 6:39 PM, David Holmes > wrote: > > On 16/12/2012 1:56 AM, Brian Goetz wrote: > > For Iterator.remove, I think the real constraint is: the JDK must > provide *a* default implementation (since this is only an issue for > compatibility across JDKs). Since the only reasonable default > would be > to throw something, we might as well specify what is thrown, > since this > degree of freedom serves noone: the JDK must provide a default that > throws UOE. > > > One example does not a policy make. We have to address the general > issue not, I hope, do a case-by-case analysis to see if we think any > other default implementation is possible or reasonable. Put another > way do we say: > > "This default implementation does ..." > > or > > "The default implementation does ..." > > ? > > > On a related but separate note, the "is equivalent to" approach has > caused not insignificant confusion over the years because no one > knows exactly what it means and then we get bug reports because it's > only equivalent in the base class, not in any subclasses (eg we call > a private method that 'is equivalent to' calling a bunch of public > methods but we don't actually call them so overriding in the > subclass doesn't have the expected affect). > > David > ----- > > > > On 12/15/2012 1:50 AM, David Holmes wrote: > > On 15/12/2012 2:45 AM, Doug Lea wrote: > > On 12/14/12 11:30, Brian Goetz wrote: > > 1. Document "Implementation note: The default > implementation > currently..." > > > As always, the fewer of these the better. In j.u/j.u.c, > these > are used mostly for resource limitations (like max > threads in FJP) > that might someday be lifted. > > > 2. Document "The default implementation behaves as > if..." (Or whatever > Doug's > proposed wording is.) > > > In j.u.c, we always say "is behaviorally equivalent to" > but I dropped > the "behaviorally" in Map candidate because someone once > told me > it was overly pedantic :-) > > > 3. Document "The default implementation MUST" > > > Isn't this just the normal spec part, that should > precede the default > implementation part? > > > I think not. The "normal spec" describes the abstract > operation. "The > default implementation MUST" specifies the concrete > implementation. > > But it sounds like we do not intend to lock in what these > default > implementations do, so, for example, my version of > j.u.Iterator.remove > doesn't have to throw UnsupportedOperationException if I > have some magic > way of providing a default remove operation - correct? > > David > > -Doug > > From josh at bloch.us Sun Dec 16 21:13:56 2012 From: josh at bloch.us (Joshua Bloch) Date: Sun, 16 Dec 2012 21:13:56 -0800 Subject: The implementation of default methods In-Reply-To: <50CEA5A8.1070304@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC9DB6.5090802@oracle.com> <50CE85DC.6020408@oracle.com> <50CEA5A8.1070304@oracle.com> Message-ID: On Sun, Dec 16, 2012 at 8:55 PM, David Holmes wrote: > On 17/12/2012 2:46 PM, Joshua Bloch wrote: > >> Complexity meter buried deep in the red. >> > > Any suggestion on how to move it? Dump default methods. Josh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121216/40ab024c/attachment.html From david.holmes at oracle.com Sun Dec 16 21:23:01 2012 From: david.holmes at oracle.com (David Holmes) Date: Mon, 17 Dec 2012 15:23:01 +1000 Subject: The implementation of default methods In-Reply-To: References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC9DB6.5090802@oracle.com> <50CE85DC.6020408@oracle.com> <50CEA5A8.1070304@oracle.com> Message-ID: <50CEAC35.5040006@oracle.com> On 17/12/2012 3:13 PM, Joshua Bloch wrote: > On Sun, Dec 16, 2012 at 8:55 PM, David Holmes > wrote: > > On 17/12/2012 2:46 PM, Joshua Bloch wrote: > > Complexity meter buried deep in the red. > > > Any suggestion on how to move it? > > > Dump default methods. Not helpful. David > Josh > From dl at cs.oswego.edu Mon Dec 17 04:48:03 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 17 Dec 2012 07:48:03 -0500 Subject: ConcurrentHashMap/ConcurrentMap/Map.compute In-Reply-To: <50CBCB4D.7030209@cs.oswego.edu> References: <50BF406D.5000805@cs.oswego.edu> <50BFB835.6090707@cs.oswego.edu> <50BFBD90.4030701@cs.oswego.edu> <50C19840.6070909@oracle.com> <50C1E7FF.8040307@cs.oswego.edu> <50C1F344.3020905@oracle.com> <50C1F538.2010006@univ-mlv.fr> <50C1F8F1.6050701@cs.oswego.edu> <50C1FB99.5060103@oracle.com> <50C20286.90002@cs.oswego.edu> <50C21662.8050308@cs.oswego.edu> <50CB3D47.5050106@cs.oswego.edu> <50CB4DB4.5090000@cs.oswego.edu> <50CB5A69.8020706@oracle.com> <50CB6835.4000203@cs.oswego.edu> <50CB69BC.3010202@oracle.com> <50CB6DF9.1090700@cs.oswego.edu> <50CB70A0.3000606@oracle.com> <50CB83DC.8050505@cs.oswego.edu> <50CB85B0.70505@oracle.com> <50CBC6DB.509@cs.oswego.edu> <50CBC8A6.4080409@oracle.com> <50CBCB4D.7030209@cs.oswego.edu> Message-ID: <50CF1483.50509@cs.oswego.edu> A fixed-up successfully jdk8-javadoc'ed version of Map is now at http://gee.cs.oswego.edu/dl/wwwtmp/apis/Map.java with displayable javadoc at: http://gee.cs.oswego.edu/dl/wwwtmp/apis/java/util/Map.html Comments? Complaints? If not, feel free to integrate. (Arne: thanks for pointing out that "merge" was mangled.) Side note about this and a few other upcoming cases: We must keep j.u.c sources independent, so normally commit to jsr166 and then integrate into openjdk. But ad-hoc mechanics seem the only way to deal with updates to existing files. Handing off is likely nicer than directly committing into lambda. -Doug From dl at cs.oswego.edu Mon Dec 17 05:11:48 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 17 Dec 2012 08:11:48 -0500 Subject: The implementation of default methods In-Reply-To: <50CE85DC.6020408@oracle.com> References: <50CB2472.2000306@oracle.com> <50CB5425.6000403@oracle.com> <50CB57A1.7060707@cs.oswego.edu> <50CC1DB0.1040806@oracle.com> <50CC9DB6.5090802@oracle.com> <50CE85DC.6020408@oracle.com> Message-ID: <50CF1A14.8050302@cs.oswego.edu> On 12/16/12 21:39, David Holmes wrote: > On 16/12/2012 1:56 AM, Brian Goetz wrote: >> For Iterator.remove, I think the real constraint is: the JDK must >> provide *a* default implementation (since this is only an issue for >> compatibility across JDKs). Since the only reasonable default would be >> to throw something, we might as well specify what is thrown, since this >> degree of freedom serves noone: the JDK must provide a default that >> throws UOE. > > One example does not a policy make. We have to address the general issue not, I Especially the corner case of always throwing an exception, which generalizes only to those default implementations in which there is only one plausible way to do something, so the javadoc might as well say exactly what it does. > > On a related but separate note, the "is equivalent to" approach has caused not > insignificant confusion over the years because no one knows exactly what it > means I'd put the "exactly what it means" in the supporting docs (maybe technotes/guides/collections/index.html) that will need an overhaul anyway. It's not hard to rigorously define. -Doug From dl at cs.oswego.edu Mon Dec 17 05:25:32 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 17 Dec 2012 08:25:32 -0500 Subject: apply In-Reply-To: <50CE7E6F.9040300@oracle.com> References: <50CE46D5.2080307@cs.oswego.edu> <50CE7E6F.9040300@oracle.com> Message-ID: <50CF1D4C.7080401@cs.oswego.edu> On 12/16/12 21:07, Brian Goetz wrote: > Now, you might say "that's a stupid example" (and you might be right!) But, the > "all different" rule allows for the possibility that this might actually be > reasonable in some configuration; the "all same" rule ensures that this can > never happen. Penalizing everyone (in the sense of having to remember or rely on IDE to remember which of the many nearly-synonymous names are used for a given SAM) for the sake of allowing inadvisable and work-aroundable usages seems like a questionable tradeoff. > I do find myself tripping over > UnaryOperator.operate vs Function.apply since they're both just so functiony. Right. > And here's where there might be some good news for you. Since we currently have > > interface UnaryOperator extends Function > > then it actually is quite reasonable for UnaryOperator to adopt the name from > Function, since there is no way to implement UnaryOperator without implementing > Function. In which case some of the offenders -- Unary/BinaryOperator -- can go > away. Please. Also, it is amusing that in current lambda repo, Block has accept but BiBlock has apply. That's the one that led to my "OMG I gotta stop this nonsense!" posting. -Doug From brian.goetz at oracle.com Mon Dec 17 05:29:19 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 17 Dec 2012 08:29:19 -0500 Subject: apply In-Reply-To: <50CF1D4C.7080401@cs.oswego.edu> References: <50CE46D5.2080307@cs.oswego.edu> <50CE7E6F.9040300@oracle.com> <50CF1D4C.7080401@cs.oswego.edu> Message-ID: <50CF1E2F.6010808@oracle.com> >> And here's where there might be some good news for you. Since we >> currently have >> >> interface UnaryOperator extends Function >> >> then it actually is quite reasonable for UnaryOperator to adopt the >> name from >> Function, since there is no way to implement UnaryOperator without >> implementing >> Function. In which case some of the offenders -- Unary/BinaryOperator >> -- can go >> away. > > Please. > > Also, it is amusing that in current lambda repo, > Block has accept but BiBlock has apply. That's the one > that led to my "OMG I gotta stop this nonsense!" posting. Yeah, that's just a typo that's going to get fixed. From brian.goetz at oracle.com Mon Dec 17 07:13:27 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 17 Dec 2012 10:13:27 -0500 Subject: Fixing flatMap: bikeshed edition Message-ID: <50CF3697.1030101@oracle.com> I'm working on a better API for flatMap. But, part of fixing flatMap is the name. If we were mapping elements to Streams, flatMap would be a fine name, but that's not what we're doing (nor is it all that practical.) So the name is already a bit misleading. Candidates: - mapMulti - explode Others? From forax at univ-mlv.fr Mon Dec 17 07:13:07 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 17 Dec 2012 16:13:07 +0100 Subject: Fixing flatMap: bikeshed edition In-Reply-To: <50CF3697.1030101@oracle.com> References: <50CF3697.1030101@oracle.com> Message-ID: <50CF3683.6060203@univ-mlv.fr> On 12/17/2012 04:13 PM, Brian Goetz wrote: > I'm working on a better API for flatMap. But, part of fixing flatMap > is the name. If we were mapping elements to Streams, flatMap would be > a fine name, but that's not what we're doing (nor is it all that > practical.) So the name is already a bit misleading. > > Candidates: > - mapMulti > - explode > > Others? linearize ? R?mi From dl at cs.oswego.edu Mon Dec 17 07:17:21 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 17 Dec 2012 10:17:21 -0500 Subject: Fixing flatMap: bikeshed edition In-Reply-To: <50CF3697.1030101@oracle.com> References: <50CF3697.1030101@oracle.com> Message-ID: <50CF3781.4060604@cs.oswego.edu> On 12/17/12 10:13, Brian Goetz wrote: > I'm working on a better API for flatMap. But, part of fixing flatMap is the > name. If we were mapping elements to Streams, flatMap would be a fine name, but > that's not what we're doing (nor is it all that practical.) So the name is > already a bit misleading. > > Candidates: > - mapMulti > - explode > merge? mergedMap? -Doug From aleksey.shipilev at oracle.com Mon Dec 17 07:18:11 2012 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 17 Dec 2012 19:18:11 +0400 Subject: Fixing flatMap: bikeshed edition In-Reply-To: <50CF3697.1030101@oracle.com> References: <50CF3697.1030101@oracle.com> Message-ID: <50CF37B3.3020809@oracle.com> On 12/17/2012 07:13 PM, Brian Goetz wrote: > Others? Having the "rewriting" analogy in mind: - substitute[With] - replace[With] -Aleksey. From brian.goetz at oracle.com Mon Dec 17 19:18:08 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 17 Dec 2012 22:18:08 -0500 Subject: flatMap Message-ID: <50CFE070.5060705@oracle.com> So, of the names suggested here so far for flatMap, my favorite is the one inspired by Don -- mapMulti. It still sounds like map, is pretty clear what it's about (multi-valued map), and it steers clear of a lot of other pitfalls. While the bikeshed paint is still wet, we can talk about the API. Here's an improved proposal. This may not be perfect, but it's definitely better than what we have now. interface DownstreamContext /* placeholder name */ { void yield(T element); void yield(T[] array); void yield(Collection collection); void yield(Stream stream); // can add more } interface Multimapper /* placeholder name */ { void map(DownstreamContext downstream, T element); } interface Stream { ... Stream mapMulti(Multimapper mapper); ... } This handles the "generator" case that the current API is built around, but also handles the other cases well too: Example 1 -- collection. foos.mapMulti((downstream, foo) -> downstream.yield(getBars(foo)))... Example 2 -- generator. ints.mapMulti((d, i) -> { for (int j=0; j d.yield(adults.stream().filter(a -> isParent(a, f)))); The downstream context argument is still annoying, but I think is clearer than the current "sink" argument is. The alternative would be to have N special-purpose functional interfaces and N overloads for the non-generator cases (stream, collection) in addition to the current generator form. From dl at cs.oswego.edu Tue Dec 18 11:02:58 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 18 Dec 2012 14:02:58 -0500 Subject: Spliterator Message-ID: <50D0BDE2.2040402@cs.oswego.edu> Having gotten Stream-compatible Spliterators to "work" with ConcurrentHashMap view classes (keySet, values, entrySet), here's a another try at recommending some reworking of the Spliterator interface. This time, I'm trying hard to make the smallest changes that support what Brian et al are already doing. (Although changing one of those things.) Pasted below (minus top-level javadoc, that would need at most a small touch-ip.) Main issues: 1. At the moment, unless you implement "sizeIfKnown" of top-level Spliterator as non-negative, no splitting takes place, even if you implement estimatedSize. (Only implementing estimatedSize seemed like the right thing to do for CHM; among other reasons because its size might change while traversing. But I had to lie in getSizeIfKnown to make it do approximately the right thing.) So I tried to clearly spec the cases for (renamed) "exactSize" and "estimatedSize", and similarly spec (renamed) hasExactSplits. If we go with this (or even if not), the stream implementation should be changed accordingly. 2. Because split() is stateful, it is unclear at best what a return value > 1 for getNaturalSplits might mean? All the child splits and their descendents? Only the next #getNaturalSplits calls? Flailing around left me with the conclusion that the only sensible way to spec this is to rename as boolean isSplittable(), thus clearly referring only to the next call to split. Comments? Complaints? public interface Spliterator { /** * Returns a Spliterator covering some portion of the elements, * guaranteed not to overlap with those retained by this * Spliterator. After invoking this method, the current * Spliterator will not cover the elements of the * returned Spliterator. * *

This method may throw an IllegalStateException if a * traversal via {@link #iterator} or {@link @forEach} has already * commenced. * * @return a Spliterator covering the some portion, possibly empty, of the * data structure elements. * @throws IllegalStateException if traversal has already commenced */ Spliterator split(); /** * Returns {@code false} if an invocation of {@code split()} is * guaranteed to return an empty Spliterator. Otherwise the method * implementation may choose a return value based on data * structure constraints and efficiency considerations. */ boolean isSplittable(); /** * Return the Iterator covering the remaining elements. The same * iterator instance must be returned for every invocation. This * method initiates the traversal phase.

* @return the iterator of the remaining elements. */ Iterator iterator(); /** * Performs the given action for all remaining elements. * * @param block The action */ default void forEach(Block block) { iterator().forEach(block); } /** * Returns the number of elements that would be encountered by an * {@link #iterator} or {@link @forEach} traversal, or returns a * negative value if unknown, or if computing this value may * itself require traversal or significant computation. */ default long exactSize() { return -1; } /** * Returns an estimate of the number of elements that would be * encountered by an {@link #iterator} or {@link @forEach} * traversal, or returns a negative value if unknown, or if * computing this value may itself require traversal or * significant computation. * *

For example, a sub-spliterator of an approximately balanced * tree may return a value that estimates the number of elements * to be half of that of its parent. */ default long estimatedSize() { return exactSize(); } /** * Return {@code true} if the {@link #exactSize} method of this * Spliterator and all of those split from it return non-negative * results. */ boolean hasExactSplits(); } From brian.goetz at oracle.com Tue Dec 18 12:00:52 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 18 Dec 2012 15:00:52 -0500 Subject: Spliterator In-Reply-To: <50D0BDE2.2040402@cs.oswego.edu> References: <50D0BDE2.2040402@cs.oswego.edu> Message-ID: <50D0CB74.1040406@oracle.com> Great to see a concrete example in someone else's code! > 1. At the moment, unless you implement "sizeIfKnown" > of top-level Spliterator as non-negative, no splitting > takes place, even if you implement estimatedSize. > (Only implementing estimatedSize seemed like the right > thing to do for CHM; among other reasons because > its size might change while traversing. But I had > to lie in getSizeIfKnown to make it do approximately > the right thing.) Right, this is definitely an area where things can be cleaned up. It is not clear that the current model has the right set of buckets yet for size management, as you discovered. We had actually been thinking of just getting rid of estimateSize entirely, on the theory that (at least in the current codebase) nothing actually implements it with anything notrivial, and if it did, the most likely implementation would be "parentSizeEstimate / (parentNaturalSplits+1)". Which is an estimate that AbstractTask is perfectly qualified to make equally well, without burdening Spliterators with more bookkeeping. So we had actually considered just dropping it (in part, in response to your earlier query about "do we need all these methods?") But, all of our collections are not trying to deal with concurrent modification; the non-interference assumption is strong. So for explicitly concurrent collections, estimateSize may well be the only thing you can offer up, in which case we do need to keep it around. > So I tried to clearly spec the cases for > (renamed) "exactSize" and "estimatedSize", and > similarly spec (renamed) hasExactSplits. > > If we go with this (or even if not), the stream > implementation should be changed accordingly. > > 2. Because split() is stateful, it is unclear at > best what a return value > 1 for getNaturalSplits > might mean? All the child splits and their descendents? > Only the next #getNaturalSplits calls? > Flailing around left me with the conclusion that > the only sensible way to spec this is to rename as > boolean isSplittable(), thus clearly referring only to the > next call to split. I appreciate calling attention to the spec hole here, but I think this specific mitigation suggestion throws the bath toys out with the bathwater. Specifically, the boolean-vs-int return. There's no reason it can't be spec'ed and generalized to more than one subsequent split (and if an implementation doesn't want to promise that, return 1.) First, getNaturalSplits is purely advisory, since you can always return an empty spliterator from split(). The only value is to hint towards a more balanced splitting for sources that may be better off splitting other than binary. To answer the question directly, getNaturalSplits() talks only about the next N calls to split() from this spliterator; it is saying "how many times should I call split() to achieve the most balance splitting." Very often the answer will be 1. On the other hand, isPredictableSplits/hasExactSplits is an even stronger statement than you spec below -- it is saying that all spliterators returned from split() -- AND all spliterators returned from that, transitively -- have exact sizes. (I like the hasExactSplits name.) > Comments? Complaints? More fine-grained comments inline. > public interface Spliterator { > > /** > * Returns a Spliterator covering some portion of the elements, > * guaranteed not to overlap with those retained by this > * Spliterator. After invoking this method, the current > * Spliterator will not cover the elements of the > * returned Spliterator. > * > *

This method may throw an IllegalStateException if a > * traversal via {@link #iterator} or {@link @forEach} has already > * commenced. > * > * @return a Spliterator covering the some portion, possibly empty, > of the > * data structure elements. > * @throws IllegalStateException if traversal has already commenced > */ > Spliterator split(); +1. > /** > * Returns {@code false} if an invocation of {@code split()} is > * guaranteed to return an empty Spliterator. Otherwise the method > * implementation may choose a return value based on data > * structure constraints and efficiency considerations. > */ > boolean isSplittable(); See above, I think this is less powerful than getNaturalSplits() for no benefit. How about something along the lines of: Return the number of calls to {@code split()} on this spliterator that will most naturally divide the remaining elements in a balanced manner. If this method returns 0, split() is guaranteed to return an empty Spliterator (where empty means also returning zero from getExactSize). We can provide further non-normative guidance to implementors: - If it is difficult to ascertain the number of splits that will be optimal, return 1 if you want to encourage further splitting and zero if you want to discourage further splitting. - It is always acceptable to return 1, since split may return an empty spliterator. This degenerates exactly into what you wrote if you only return 1 or 0 (which obviously you're content to do), and still leaves the door open for non-binary splits at what seems to me to be no cost. (I say no cost in part because I actually like the N-ary code in the framework better anyway than the binary code -- it is generally shorter and has less duplication, so its not like this is imposing a cost on the framework either.) int getNaturalSplits(); > /** > * Return the Iterator covering the remaining elements. The same > * iterator instance must be returned for every invocation. This > * method initiates the traversal phase.

> * @return the iterator of the remaining elements. > */ > Iterator iterator(); > > /** > * Performs the given action for all remaining elements. > * > * @param block The action > */ > default void forEach(Block block) { > iterator().forEach(block); > } > > /** > * Returns the number of elements that would be encountered by an > * {@link #iterator} or {@link @forEach} traversal, or returns a > * negative value if unknown, or if computing this value may > * itself require traversal or significant computation. > */ > default long exactSize() { > return -1; > } +1 (I still like getExactSizeIfKnown, since it makes it clearer that you might return -1.) > /** > * Returns an estimate of the number of elements that would be > * encountered by an {@link #iterator} or {@link @forEach} > * traversal, or returns a negative value if unknown, or if > * computing this value may itself require traversal or > * significant computation. > * > *

For example, a sub-spliterator of an approximately balanced > * tree may return a value that estimates the number of elements > * to be half of that of its parent. > */ > default long estimatedSize() { > return exactSize(); > } +1, though I'd hoped to get rid of estimatedSize entirely. > /** > * Return {@code true} if the {@link #exactSize} method of this > * Spliterator and all of those split from it return non-negative > * results. > */ > boolean hasExactSplits(); > } Needs stronger wording to bind not only to the splits you get from this spliterator, but all their children too. From brian.goetz at oracle.com Tue Dec 18 12:55:08 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 18 Dec 2012 15:55:08 -0500 Subject: Spliterator In-Reply-To: <50D0BDE2.2040402@cs.oswego.edu> References: <50D0BDE2.2040402@cs.oswego.edu> Message-ID: <50D0D82C.1070707@oracle.com> > 1. At the moment, unless you implement "sizeIfKnown" > of top-level Spliterator as non-negative, no splitting > takes place, even if you implement estimatedSize. This is not supposed to be true! So let's just call this a bug. From paul.sandoz at oracle.com Tue Dec 18 13:31:44 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 18 Dec 2012 22:31:44 +0100 Subject: Spliterator In-Reply-To: <50D0D82C.1070707@oracle.com> References: <50D0BDE2.2040402@cs.oswego.edu> <50D0D82C.1070707@oracle.com> Message-ID: <495C8B3F-BBAD-4855-9074-511572DDD4CB@oracle.com> On Dec 18, 2012, at 9:55 PM, Brian Goetz wrote: >> 1. At the moment, unless you implement "sizeIfKnown" >> of top-level Spliterator as non-negative, no splitting >> takes place, even if you implement estimatedSize. > > This is not supposed to be true! So let's just call this a bug. > I think i found the source as to why that is the case, it's in an ugly piece of code that suggests the size for which no further splits should be performed. Will fix it tomorrow if no one else gets to it before me. Paul. From brian.goetz at oracle.com Tue Dec 18 19:36:14 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 18 Dec 2012 22:36:14 -0500 Subject: Spliterator In-Reply-To: <50D0BDE2.2040402@cs.oswego.edu> References: <50D0BDE2.2040402@cs.oswego.edu> Message-ID: <50D1362E.7010609@oracle.com> Committed recommended spec and method renamings that follow all of these, except changing the return type of getNaturalSplits. Committed new text for getNaturalSplits which hopefully should be acceptable? On 12/18/2012 2:02 PM, Doug Lea wrote: > > Having gotten Stream-compatible Spliterators to "work" > with ConcurrentHashMap view classes (keySet, values, entrySet), > here's a another try at recommending some reworking of > the Spliterator interface. > > This time, I'm trying hard to make the smallest > changes that support what Brian et al are already doing. > (Although changing one of those things.) > Pasted below (minus top-level javadoc, that would need > at most a small touch-ip.) > > Main issues: > > 1. At the moment, unless you implement "sizeIfKnown" > of top-level Spliterator as non-negative, no splitting > takes place, even if you implement estimatedSize. > (Only implementing estimatedSize seemed like the right > thing to do for CHM; among other reasons because > its size might change while traversing. But I had > to lie in getSizeIfKnown to make it do approximately > the right thing.) > > So I tried to clearly spec the cases for > (renamed) "exactSize" and "estimatedSize", and > similarly spec (renamed) hasExactSplits. > > If we go with this (or even if not), the stream > implementation should be changed accordingly. > > 2. Because split() is stateful, it is unclear at > best what a return value > 1 for getNaturalSplits > might mean? All the child splits and their descendents? > Only the next #getNaturalSplits calls? > Flailing around left me with the conclusion that > the only sensible way to spec this is to rename as > boolean isSplittable(), thus clearly referring only to the > next call to split. > > Comments? Complaints? > > > public interface Spliterator { > > /** > * Returns a Spliterator covering some portion of the elements, > * guaranteed not to overlap with those retained by this > * Spliterator. After invoking this method, the current > * Spliterator will not cover the elements of the > * returned Spliterator. > * > *

This method may throw an IllegalStateException if a > * traversal via {@link #iterator} or {@link @forEach} has already > * commenced. > * > * @return a Spliterator covering the some portion, possibly empty, > of the > * data structure elements. > * @throws IllegalStateException if traversal has already commenced > */ > Spliterator split(); > > /** > * Returns {@code false} if an invocation of {@code split()} is > * guaranteed to return an empty Spliterator. Otherwise the method > * implementation may choose a return value based on data > * structure constraints and efficiency considerations. > */ > boolean isSplittable(); > > /** > * Return the Iterator covering the remaining elements. The same > * iterator instance must be returned for every invocation. This > * method initiates the traversal phase.

> * @return the iterator of the remaining elements. > */ > Iterator iterator(); > > /** > * Performs the given action for all remaining elements. > * > * @param block The action > */ > default void forEach(Block block) { > iterator().forEach(block); > } > > /** > * Returns the number of elements that would be encountered by an > * {@link #iterator} or {@link @forEach} traversal, or returns a > * negative value if unknown, or if computing this value may > * itself require traversal or significant computation. > */ > default long exactSize() { > return -1; > } > > /** > * Returns an estimate of the number of elements that would be > * encountered by an {@link #iterator} or {@link @forEach} > * traversal, or returns a negative value if unknown, or if > * computing this value may itself require traversal or > * significant computation. > * > *

For example, a sub-spliterator of an approximately balanced > * tree may return a value that estimates the number of elements > * to be half of that of its parent. > */ > default long estimatedSize() { > return exactSize(); > } > > /** > * Return {@code true} if the {@link #exactSize} method of this > * Spliterator and all of those split from it return non-negative > * results. > */ > boolean hasExactSplits(); > } From tim at peierls.net Tue Dec 18 19:55:17 2012 From: tim at peierls.net (Tim Peierls) Date: Tue, 18 Dec 2012 22:55:17 -0500 Subject: Spliterator In-Reply-To: <50D1362E.7010609@oracle.com> References: <50D0BDE2.2040402@cs.oswego.edu> <50D1362E.7010609@oracle.com> Message-ID: On Tue, Dec 18, 2012 at 10:36 PM, Brian Goetz wrote: > Committed recommended spec and method renamings that follow all of these, > except changing the return type of getNaturalSplits. Committed new text > for getNaturalSplits which hopefully should be acceptable? If the text below is the latest, then I don't think it's ready yet. In the first paragraph, 0 means next split return guaranteed empty. In the second paragraph 0 means further splitting is discouraged, but it doesn't seem to require next split to return empty. "Discouraged" is a bit vauge, so why not add a parenthetical clarification at the end of the second paragraph, e.g., "... discourage further splitting (by advertising that the next call to split will return an empty Spliterator)." 51 * Return the number of calls to {@link #split} on this spliterator that will 52 * most naturally divide the remaining elements in a balanced manner. 53 * If this method returns 0, {@link #split} is guaranteed to return an 54 * empty {@code Spliterator}, where empty means also returning zero from 55 * {@link #exactSizeIfKnown}. 56 * 57 *

If it is difficult to ascertain the number of splits that will result in 58 * an optimal balance, implementations should return 1 if they wish to encourage 59 * further splitting and 0 if they wish to discourage further splitting. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121218/643f3838/attachment-0001.html From tim at peierls.net Tue Dec 18 19:55:42 2012 From: tim at peierls.net (Tim Peierls) Date: Tue, 18 Dec 2012 22:55:42 -0500 Subject: Spliterator In-Reply-To: <50D1362E.7010609@oracle.com> References: <50D0BDE2.2040402@cs.oswego.edu> <50D1362E.7010609@oracle.com> Message-ID: vauge -> vague On Tue, Dec 18, 2012 at 10:36 PM, Brian Goetz wrote: > Committed recommended spec and method renamings that follow all of these, > except changing the return type of getNaturalSplits. Committed new text > for getNaturalSplits which hopefully should be acceptable? > > > On 12/18/2012 2:02 PM, Doug Lea wrote: > >> >> Having gotten Stream-compatible Spliterators to "work" >> with ConcurrentHashMap view classes (keySet, values, entrySet), >> here's a another try at recommending some reworking of >> the Spliterator interface. >> >> This time, I'm trying hard to make the smallest >> changes that support what Brian et al are already doing. >> (Although changing one of those things.) >> Pasted below (minus top-level javadoc, that would need >> at most a small touch-ip.) >> >> Main issues: >> >> 1. At the moment, unless you implement "sizeIfKnown" >> of top-level Spliterator as non-negative, no splitting >> takes place, even if you implement estimatedSize. >> (Only implementing estimatedSize seemed like the right >> thing to do for CHM; among other reasons because >> its size might change while traversing. But I had >> to lie in getSizeIfKnown to make it do approximately >> the right thing.) >> >> So I tried to clearly spec the cases for >> (renamed) "exactSize" and "estimatedSize", and >> similarly spec (renamed) hasExactSplits. >> >> If we go with this (or even if not), the stream >> implementation should be changed accordingly. >> >> 2. Because split() is stateful, it is unclear at >> best what a return value > 1 for getNaturalSplits >> might mean? All the child splits and their descendents? >> Only the next #getNaturalSplits calls? >> Flailing around left me with the conclusion that >> the only sensible way to spec this is to rename as >> boolean isSplittable(), thus clearly referring only to the >> next call to split. >> >> Comments? Complaints? >> >> >> public interface Spliterator { >> >> /** >> * Returns a Spliterator covering some portion of the elements, >> * guaranteed not to overlap with those retained by this >> * Spliterator. After invoking this method, the current >> * Spliterator will not cover the elements of the >> * returned Spliterator. >> * >> *

This method may throw an IllegalStateException if a >> * traversal via {@link #iterator} or {@link @forEach} has already >> * commenced. >> * >> * @return a Spliterator covering the some portion, possibly empty, >> of the >> * data structure elements. >> * @throws IllegalStateException if traversal has already commenced >> */ >> Spliterator split(); >> >> /** >> * Returns {@code false} if an invocation of {@code split()} is >> * guaranteed to return an empty Spliterator. Otherwise the method >> * implementation may choose a return value based on data >> * structure constraints and efficiency considerations. >> */ >> boolean isSplittable(); >> >> /** >> * Return the Iterator covering the remaining elements. The same >> * iterator instance must be returned for every invocation. This >> * method initiates the traversal phase.

>> * @return the iterator of the remaining elements. >> */ >> Iterator iterator(); >> >> /** >> * Performs the given action for all remaining elements. >> * >> * @param block The action >> */ >> default void forEach(Block block) { >> iterator().forEach(block); >> } >> >> /** >> * Returns the number of elements that would be encountered by an >> * {@link #iterator} or {@link @forEach} traversal, or returns a >> * negative value if unknown, or if computing this value may >> * itself require traversal or significant computation. >> */ >> default long exactSize() { >> return -1; >> } >> >> /** >> * Returns an estimate of the number of elements that would be >> * encountered by an {@link #iterator} or {@link @forEach} >> * traversal, or returns a negative value if unknown, or if >> * computing this value may itself require traversal or >> * significant computation. >> * >> *

For example, a sub-spliterator of an approximately balanced >> * tree may return a value that estimates the number of elements >> * to be half of that of its parent. >> */ >> default long estimatedSize() { >> return exactSize(); >> } >> >> /** >> * Return {@code true} if the {@link #exactSize} method of this >> * Spliterator and all of those split from it return non-negative >> * results. >> */ >> boolean hasExactSplits(); >> } >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121218/0b37d5dc/attachment.html From brian.goetz at oracle.com Tue Dec 18 20:19:11 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 18 Dec 2012 23:19:11 -0500 Subject: Getting rid of pull Message-ID: <50D1403F.4010802@oracle.com> Currently, stream pipelines can operate in one of two modes: push or pull. (This distinction is invisible to the user; the choice is made by the library based on a number of factors.) Given a pipeline: stream.filter(...).map(...).reduce(...) A "pull" traversal involves taking an Iterator for the stream, wrapping a filtering iterator around it, wrapping a mapping iterator around that, and having the reduce loop pull elements from the upstream iterator and accumulate the result. A "push" traversal involves creating a Sink for the reduction stage, wrapping a mapping sink around that, wrapping a filtering sink around that, and forEach'ing the elements from the source into the filtering sink, and then asking the reducing sink at the end for its result. Currently, all operations are required to operated in both modes; to wrap a Sink or an Iterator. Short-circuiting operations such as "findFirst" operate in "pull" mode, so as to be usable and performant even on infinite streams. Most other operations operate in push mode, because it is more efficient. Iterators have more per-element overhead; you have to call two methods (hasNext and next), which generally must be defensively coded, requiring duplicated activity between these two methods. Pull mode also means more heap write traffic (and therefore bad interactions with cardmarking) than push mode, since iterators are nearly always stateful. So the above chain in "pull" mode requires three stateful iterators (and the accumulation is done in a local), whereas in "push" mode is two stateless sinks and a stateful sink. (Pipelines ending in forEach are fully stateless, though the forEach target will have side-effects.) So we try hard to use push mode wherever possible. Over the past few months, we've made a series of simplifications in the stream model, including restrictions against stream reuse. This has eliminated some of the use cases where pull was required, such as: Iterator it = stream.iterator(); T first = it.next(); stream.forEach(block); // process cdr of stream This used to be legal, and forced the forEach into pull mode because the source iterator has already been partially consumed. Now this case (and many of its relatives, such as forked streams) is illegal, eliminating one of the use cases for pull. These simplifications enable us to consider getting rid of direct support for pull entirely. If we eliminate support for push in the framework, huge chunks of code dealing with Iterators simply goes away. This is a really desirable outcome -- iterator code is bulky and ugly and error-prone. The remaining use cases for "pull" are: - Short-circuit operations, such as findFirst/findAny. - Escape-hatch operations, whereby the client code requests an Iterator (or Spliterator) for the stream. Examples of "escape hatch" usage might be "give me every second element" or "give me the third element after the first blue element." While these are important use cases to continue to support, they are likely to be a small percentage of actual traversals. If we got rid of support for iterator wrapping, we could still simulate an Iterator as follows: - Construct a terminal sink which accumulates elements into a buffer (such as that used in flatMap) - Construct a push-oriented sink chain which flows into this buffer - Get an Iterator for the source - When more elements are needed, pull elements from the source, push them into the sink, and continue until out of elements or until something flows into the terminal buffering sink. Then consume elements from the buffering sink until gone, and if there are more elements remaining in the source, continue until the pipeline is dry. (Think of this as a "pushme-pullyou adapter.") This has slightly higher overhead, but it seems pretty reasonable, and doesn't get used that often. And then, all the code having to do with wrapping iterators can go away, and iterators are then only used as sources and escape hatches (and internally within short-circuit operations.) Pretty nice. We can even go one step farther, and eliminate iterators completely, but it involves some ugliness. The ugliness involves use of an exception to signal from a forEach target through the forEach source to the initiator of the forEach to about dispensing of elements. (We can eliminate most of the cost of the exception by not gathering the stack trace, which is by far the most expensive part.) Then, Spliterator need not even support an iterator() method, just a forEach method that doesn't fall apart if you throw an exception through it. (Similarly, we don't expose an iterator() method, just a version of forEach that lets the target say "no mas" which causes the exception to be thrown.) This would eliminate even more low-value code (and simplify the Spliterator spec) that the above approach. But the exception thing is kind of stinky. I don't see any reason to not do at least the first one. From brian.goetz at oracle.com Tue Dec 18 21:03:02 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 19 Dec 2012 00:03:02 -0500 Subject: flatMap In-Reply-To: <50CFE070.5060705@oracle.com> References: <50CFE070.5060705@oracle.com> Message-ID: <50D14A86.9050102@oracle.com> Pushing a strawman implementation that uses the following API: public interface MultiFunction { public void apply(Collector collector, T element); /** A collector for values associated with a given input. Values can be * yielded individually, or in aggregates such as collections, arrays, or * streams; aggregates are flattened, so that yielding an array containing * [1, 2] is equivalent to yield(1); yield(2). */ public interface Collector { void yield(U element); default void yield(Collection collection) { for (U u : collection) yield(u); } default void yield(U[] array) { for (U u : array) yield(u); } default void yield(Stream stream) { stream.forEach(this::yield); } } } interface Stream { Stream mapMulti(MultiFunction mapper); } Probably not all the way there, but I think definitely better than what we've got now. Comments welcome. On 12/17/2012 10:18 PM, Brian Goetz wrote: > So, of the names suggested here so far for flatMap, my favorite is the > one inspired by Don -- mapMulti. It still sounds like map, is pretty > clear what it's about (multi-valued map), and it steers clear of a lot > of other pitfalls. > > While the bikeshed paint is still wet, we can talk about the API. Here's > an improved proposal. This may not be perfect, but it's definitely > better than what we have now. > > interface DownstreamContext /* placeholder name */ { > void yield(T element); > void yield(T[] array); > void yield(Collection collection); > void yield(Stream stream); > // can add more > } > > interface Multimapper /* placeholder name */ { > void map(DownstreamContext downstream, T element); > } > > interface Stream { > ... > Stream mapMulti(Multimapper mapper); > ... > } > > > This handles the "generator" case that the current API is built around, > but also handles the other cases well too: > > Example 1 -- collection. > > foos.mapMulti((downstream, foo) > -> downstream.yield(getBars(foo)))... > > Example 2 -- generator. > > ints.mapMulti((d, i) -> { for (int j=0; j d.yield(j); > }) > > Example 3 -- stream. > > kids.mapMulti( > (d, k) -> d.yield(adults.stream().filter(a -> isParent(a, f)))); > > > The downstream context argument is still annoying, but I think is > clearer than the current "sink" argument is. The alternative would be > to have N special-purpose functional interfaces and N overloads for the > non-generator cases (stream, collection) in addition to the current > generator form. > From Donald.Raab at gs.com Tue Dec 18 23:39:25 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Wed, 19 Dec 2012 02:39:25 -0500 Subject: Fine Grained Coordinated Parallelism in a Real World Application Message-ID: <6712820CB52CFB4D842561213A77C05404BE203951@GSCMAMP09EX.firmwide.corp.gs.com> I thought some folks in this group might be interested in Moh's presentation titled "Fine Grained Coordinated Parallelism in a Real World Application" from QCon NY in June of this year. Infoq posted the video from the conference just a couple weeks ago. http://www.infoq.com/presentations/Fine-Grained-Parallelism Thanks, Don -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121219/27839b8d/attachment.html From forax at univ-mlv.fr Wed Dec 19 03:00:38 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 19 Dec 2012 12:00:38 +0100 Subject: Getting rid of pull In-Reply-To: <50D1403F.4010802@oracle.com> References: <50D1403F.4010802@oracle.com> Message-ID: <50D19E56.8000206@univ-mlv.fr> On 12/19/2012 05:19 AM, Brian Goetz wrote: > Currently, stream pipelines can operate in one of two modes: push or > pull. (This distinction is invisible to the user; the choice is made > by the library based on a number of factors.) > > Given a pipeline: > > stream.filter(...).map(...).reduce(...) > > A "pull" traversal involves taking an Iterator for the stream, > wrapping a filtering iterator around it, wrapping a mapping iterator > around that, and having the reduce loop pull elements from the > upstream iterator and accumulate the result. > > A "push" traversal involves creating a Sink for the reduction stage, > wrapping a mapping sink around that, wrapping a filtering sink around > that, and forEach'ing the elements from the source into the filtering > sink, and then asking the reducing sink at the end for its result. > > Currently, all operations are required to operated in both modes; to > wrap a Sink or an Iterator. > > Short-circuiting operations such as "findFirst" operate in "pull" > mode, so as to be usable and performant even on infinite streams. > > Most other operations operate in push mode, because it is more > efficient. Iterators have more per-element overhead; you have to call > two methods (hasNext and next), which generally must be defensively > coded, requiring duplicated activity between these two methods. Pull > mode also means more heap write traffic (and therefore bad > interactions with cardmarking) than push mode, since iterators are > nearly always stateful. So the above chain in "pull" mode requires > three stateful iterators (and the accumulation is done in a local), > whereas in "push" mode is two stateless sinks and a stateful sink. > (Pipelines ending in forEach are fully stateless, though the forEach > target will have side-effects.) So we try hard to use push mode > wherever possible. > > Over the past few months, we've made a series of simplifications in > the stream model, including restrictions against stream reuse. This > has eliminated some of the use cases where pull was required, such as: > > Iterator it = stream.iterator(); > T first = it.next(); > stream.forEach(block); // process cdr of stream > > This used to be legal, and forced the forEach into pull mode because > the source iterator has already been partially consumed. Now this case > (and many of its relatives, such as forked streams) is illegal, > eliminating one of the use cases for pull. These simplifications > enable us to consider getting rid of direct support for pull > entirely. If we eliminate support for push in the framework, huge > chunks of code dealing with Iterators simply goes away. This is a > really desirable outcome -- iterator code is bulky and ugly and > error-prone. > > > The remaining use cases for "pull" are: > > - Short-circuit operations, such as findFirst/findAny. > - Escape-hatch operations, whereby the client code requests an > Iterator (or Spliterator) for the stream. > > Examples of "escape hatch" usage might be "give me every second > element" or "give me the third element after the first blue element." > While these are important use cases to continue to support, they are > likely to be a small percentage of actual traversals. > > > If we got rid of support for iterator wrapping, we could still > simulate an Iterator as follows: > > - Construct a terminal sink which accumulates elements into a buffer > (such as that used in flatMap) > - Construct a push-oriented sink chain which flows into this buffer > - Get an Iterator for the source > - When more elements are needed, pull elements from the source, push > them into the sink, and continue until out of elements or until > something flows into the terminal buffering sink. Then consume > elements from the buffering sink until gone, and if there are more > elements remaining in the source, continue until the pipeline is dry. > (Think of this as a "pushme-pullyou adapter.") > > This has slightly higher overhead, but it seems pretty reasonable, and > doesn't get used that often. And then, all the code having to do with > wrapping iterators can go away, and iterators are then only used as > sources and escape hatches (and internally within short-circuit > operations.) Pretty nice. Yes, get ride of the pull mode please, Here is a small prototype that uses push only http://igm.univ-mlv.fr/~forax/tmp/pushonlypipeline/src/java/util/stream/ as you can see you don't even need to propagate an exception to implement findFirst(). > > > We can even go one step farther, and eliminate iterators completely, > but it involves some ugliness. The ugliness involves use of an > exception to signal from a forEach target through the forEach source > to the initiator of the forEach to about dispensing of elements. (We > can eliminate most of the cost of the exception by not gathering the > stack trace, which is by far the most expensive part.) Then, > Spliterator need not even support an iterator() method, just a forEach > method that doesn't fall apart if you throw an exception through it. > (Similarly, we don't expose an iterator() method, just a version of > forEach that lets the target say "no mas" which causes the exception > to be thrown.) This would eliminate even more low-value code (and > simplify the Spliterator spec) that the above approach. But the > exception thing is kind of stinky. You can not do that easily, because you need one forEach, find, and reduce by type specialization. > > > I don't see any reason to not do at least the first one. > R?mi From paul.sandoz at oracle.com Wed Dec 19 03:10:03 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 19 Dec 2012 12:10:03 +0100 Subject: Getting rid of pull In-Reply-To: <50D1403F.4010802@oracle.com> References: <50D1403F.4010802@oracle.com> Message-ID: <44F7FF45-B08B-4784-8E3E-8D68BF0C3A8F@oracle.com> On Dec 19, 2012, at 5:19 AM, Brian Goetz wrote: > > If we got rid of support for iterator wrapping, we could still simulate an Iterator as follows: > > - Construct a terminal sink which accumulates elements into a buffer (such as that used in flatMap) > - Construct a push-oriented sink chain which flows into this buffer > - Get an Iterator for the source > - When more elements are needed, pull elements from the source, push them into the sink, and continue until out of elements or until something flows into the terminal buffering sink. Then consume elements from the buffering sink until gone, and if there are more elements remaining in the source, continue until the pipeline is dry. (Think of this as a "pushme-pullyou adapter.") > > This has slightly higher overhead, but it seems pretty reasonable, and doesn't get used that often. And then, all the code having to do with wrapping iterators can go away, and iterators are then only used as sources and escape hatches (and internally within short-circuit operations.) Pretty nice. > I have a working patch (no tests fail) that removes iterator wrapping the above. The result is very clean and greatly simplifies the operation implementations. The Stream.iterator() implementation becomes slightly more complicated but i think that is a reasonable cost to pay for the overall simplification and potential performance benefits. > > We can even go one step farther, and eliminate iterators completely, but it involves some ugliness. The ugliness involves use of an exception to signal from a forEach target through the forEach source to the initiator of the forEach to about dispensing of elements. (We can eliminate most of the cost of the exception by not gathering the stack trace, which is by far the most expensive part.) Then, Spliterator need not even support an iterator() method, just a forEach method that doesn't fall apart if you throw an exception through it. (Similarly, we don't expose an iterator() method, just a version of forEach that lets the target say "no mas" which causes the exception to be thrown.) This would eliminate even more low-value code (and simplify the Spliterator spec) that the above approach. But the exception thing is kind of stinky. > > > I don't see any reason to not do at least the first one. > +1 Paul. From dl at cs.oswego.edu Wed Dec 19 05:37:39 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 19 Dec 2012 08:37:39 -0500 Subject: Spliterator In-Reply-To: <50D0CB74.1040406@oracle.com> References: <50D0BDE2.2040402@cs.oswego.edu> <50D0CB74.1040406@oracle.com> Message-ID: <50D1C323.2020707@cs.oswego.edu> On 12/18/12 15:00, Brian Goetz wrote: >> >> 2. Because split() is stateful, it is unclear at >> best what a return value > 1 for getNaturalSplits >> might mean? All the child splits and their descendents? >> Only the next #getNaturalSplits calls? >> Flailing around left me with the conclusion that >> the only sensible way to spec this is to rename as >> boolean isSplittable(), thus clearly referring only to the >> next call to split. > > I appreciate calling attention to the spec hole here, but I think this specific > mitigation suggestion throws the bath toys out with the bathwater. > Specifically, the boolean-vs-int return. There's no reason it can't be spec'ed > and generalized to more than one subsequent split (and if an implementation > doesn't want to promise that, return 1.) Sorry, I don't get this. Is the intent to be able to bypass a call to isSplittable across multiple calls to split? That is, do you need to replace: while (s.isSplittable() && /* ... other conditions ...*/) ... s.split... with: for (int splits = s.getNaturalSlits();...) { if (!/* ... other conditions ...*/) break; ... s.split... } Or is there some other reason I'm not seeing? (Or is getNaturalSplits a remnant from previous designs in split was not mutative and you could ask for multiple splits at once?) > > First, getNaturalSplits is purely advisory, since you can always return an empty > spliterator from split(). To be useful, the spec must imply that if isSplittable returns false the caller should not call split. But we cannot say such things in an interface spec. The constraint that I listed makes it clear that looping calls to split that do not check isSplittable could infinitely loop. >> /** >> * Returns {@code false} if an invocation of {@code split()} is >> * guaranteed to return an empty Spliterator. Otherwise the method >> * implementation may choose a return value based on data >> * structure constraints and efficiency considerations. >> */ >> boolean isSplittable(); > From brian.goetz at oracle.com Wed Dec 19 06:19:06 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 19 Dec 2012 09:19:06 -0500 Subject: Spliterator In-Reply-To: <50D1C323.2020707@cs.oswego.edu> References: <50D0BDE2.2040402@cs.oswego.edu> <50D0CB74.1040406@oracle.com> <50D1C323.2020707@cs.oswego.edu> Message-ID: <50D1CCDA.8080601@oracle.com> >> There's no reason it can't >> be spec'ed >> and generalized to more than one subsequent split (and if an >> implementation >> doesn't want to promise that, return 1.) > > Sorry, I don't get this. > Is the intent to be able to bypass a call to isSplittable > across multiple calls to split? That's not the goal, though that is an effect. > That is, do you need to replace: > > while (s.isSplittable() && /* ... other conditions ...*/) > ... s.split... > > with: > > for (int splits = s.getNaturalSlits();...) { > if (!/* ... other conditions ...*/) > break; > ... s.split... > } > > Or is there some other reason I'm not seeing? We use split() to build a tree of splits; this is mirrored by a tree of FJTs. I use this information to determine when split() is creating a new level of the tree, and when it is creating a new sibling at the same level. T firstChild = makeChild(spliterator.split()); setPendingCount(naturalSplits); numChildren = naturalSplits + 1; children = firstChild; T curChild = firstChild; for (int i=naturalSplits-1; i >= 0; i--) { T newChild = makeChild((i > 0) ? spliterator.split() : spliterator); curChild.nextSibling = newChild; curChild = newChild; } for (T child=children.nextSibling; child != null; child=child.nextSibling) child.fork(); firstChild.compute(); > (Or is getNaturalSplits a remnant from previous designs in > split was not mutative and you could ask for multiple splits at once?) You could think of it as an alternate way to do that, yes. The point is: I see value to the possibility of arranging spliterators in other than a binary tree. > To be useful, the spec must imply that if isSplittable returns false > the caller should not call split. Agreed. > But we cannot say such things > in an interface spec. The constraint that I listed makes it clear > that looping calls to split that do not check isSplittable > could infinitely loop. Right. But if we s/isSplittable == false/getNaturalSplits == 0/, why don't we get that same promise? I don't disagree with anything you're saying except the implicit assumption that no one would ever want to build other than a binary tree of splits. (And I don't understand what the split arity has to do with the other spec issue, and why they're being intertwined?) From dl at cs.oswego.edu Wed Dec 19 06:38:31 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 19 Dec 2012 09:38:31 -0500 Subject: Spliterator In-Reply-To: <50D1CCDA.8080601@oracle.com> References: <50D0BDE2.2040402@cs.oswego.edu> <50D0CB74.1040406@oracle.com> <50D1C323.2020707@cs.oswego.edu> <50D1CCDA.8080601@oracle.com> Message-ID: <50D1D167.3030801@cs.oswego.edu> On 12/19/12 09:19, Brian Goetz wrote: > We use split() to build a tree of splits; this is mirrored by a tree of FJTs. I > use this information to determine when split() is creating a new level of the > tree, and when it is creating a new sibling at the same level. > > T firstChild = makeChild(spliterator.split()); > setPendingCount(naturalSplits); > numChildren = naturalSplits + 1; > children = firstChild; > T curChild = firstChild; > for (int i=naturalSplits-1; i >= 0; i--) { > T newChild = makeChild((i > 0) ? spliterator.split() : > spliterator); > curChild.nextSibling = newChild; > curChild = newChild; > } > I'm trying hard to see the context or spec wording that would make this useful. Each time you call s.split, some fraction of s is split off. So it seems that the value you want here is something like: "return the number of times to split before you are logically at the next depth level, if such a level exists"? Which will be hard to state clearly. Do you have any existing examples of Spliterators that return values other than 1/0? That might help. > The point is: I see value to the possibility of arranging spliterators in other > than a binary tree. No, nothing (explicitly) about binary trees, only about supporting an incremental usage model. As in: split; maybe split some more; ... -Doug From forax at univ-mlv.fr Wed Dec 19 11:32:52 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 19 Dec 2012 20:32:52 +0100 Subject: flatMap In-Reply-To: <50D14A86.9050102@oracle.com> References: <50CFE070.5060705@oracle.com> <50D14A86.9050102@oracle.com> Message-ID: <50D21664.5090106@univ-mlv.fr> On 12/19/2012 06:03 AM, Brian Goetz wrote: > Pushing a strawman implementation that uses the following API: > > public interface MultiFunction { > public void apply(Collector collector, T element); > > /** A collector for values associated with a given input. Values > can be > * yielded individually, or in aggregates such as collections, > arrays, or > * streams; aggregates are flattened, so that yielding an array > containing > * [1, 2] is equivalent to yield(1); yield(2). > */ > public interface Collector { > void yield(U element); > > default void yield(Collection collection) { > for (U u : collection) > yield(u); > } > > default void yield(U[] array) { > for (U u : array) > yield(u); > } > > default void yield(Stream stream) { > stream.forEach(this::yield); > } > } > } > > interface Stream { > Stream mapMulti(MultiFunction mapper); > } > > > Probably not all the way there, but I think definitely better than > what we've got now. > > Comments welcome. public interface MultiFunction { public void apply(Collector collector, T element); /** A collector for values associated with a given input. Values can be * yielded individually, or in aggregates such as collections, arrays, or * streams; aggregates are flattened, so that yielding an array containing * [1, 2] is equivalent to yield(1); yield(2). */ public interface Collector { void yieldInto(U element); default void yieldInto(Collection collection) { for (U u : collection) yieldInto(u); } default void yieldInto(U[] array) { for (U u : array) yieldInto(u); } default void yieldInto(Stream stream) { stream.forEach(this::yieldInto); } } } interface Stream { Stream mapMulti(MultiFunction mapper); } I think we should not use the name 'yield' because it can be a keyword added later if we introduce coroutine, generator, etc. Also, the interface Collector is too close to Destination, I think its should be the same one. R?mi > > > On 12/17/2012 10:18 PM, Brian Goetz wrote: >> So, of the names suggested here so far for flatMap, my favorite is the >> one inspired by Don -- mapMulti. It still sounds like map, is pretty >> clear what it's about (multi-valued map), and it steers clear of a lot >> of other pitfalls. >> >> While the bikeshed paint is still wet, we can talk about the API. Here's >> an improved proposal. This may not be perfect, but it's definitely >> better than what we have now. >> >> interface DownstreamContext /* placeholder name */ { >> void yield(T element); >> void yield(T[] array); >> void yield(Collection collection); >> void yield(Stream stream); >> // can add more >> } >> >> interface Multimapper /* placeholder name */ { >> void map(DownstreamContext downstream, T element); >> } >> >> interface Stream { >> ... >> Stream mapMulti(Multimapper mapper); >> ... >> } >> >> >> This handles the "generator" case that the current API is built around, >> but also handles the other cases well too: >> >> Example 1 -- collection. >> >> foos.mapMulti((downstream, foo) >> -> downstream.yield(getBars(foo)))... >> >> Example 2 -- generator. >> >> ints.mapMulti((d, i) -> { for (int j=0; j> d.yield(j); >> }) >> >> Example 3 -- stream. >> >> kids.mapMulti( >> (d, k) -> d.yield(adults.stream().filter(a -> isParent(a, >> f)))); >> >> >> The downstream context argument is still annoying, but I think is >> clearer than the current "sink" argument is. The alternative would be >> to have N special-purpose functional interfaces and N overloads for the >> non-generator cases (stream, collection) in addition to the current >> generator form. >> From brian.goetz at oracle.com Wed Dec 19 11:37:51 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 19 Dec 2012 14:37:51 -0500 Subject: flatMap In-Reply-To: <50D21664.5090106@univ-mlv.fr> References: <50CFE070.5060705@oracle.com> <50D14A86.9050102@oracle.com> <50D21664.5090106@univ-mlv.fr> Message-ID: <50D2178F.5000009@oracle.com> > I think we should not use the name 'yield' because it can be a keyword > added later if we introduce coroutine, generator, etc. I thought of that. Then I realized we already have "Thread.yield" among others in the libraries, so there's no reason to use a suboptimal name just to avoid using the y-word. Are you saying that yieldInto is actually a *better* name? Or are you just trying to avoid a possible collision? From forax at univ-mlv.fr Wed Dec 19 11:44:54 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 19 Dec 2012 20:44:54 +0100 Subject: flatMap In-Reply-To: <50D2178F.5000009@oracle.com> References: <50CFE070.5060705@oracle.com> <50D14A86.9050102@oracle.com> <50D21664.5090106@univ-mlv.fr> <50D2178F.5000009@oracle.com> Message-ID: <50D21936.2020803@univ-mlv.fr> On 12/19/2012 08:37 PM, Brian Goetz wrote: >> I think we should not use the name 'yield' because it can be a keyword >> added later if we introduce coroutine, generator, etc. > > I thought of that. Then I realized we already have "Thread.yield" > among others in the libraries, so there's no reason to use a > suboptimal name just to avoid using the y-word. > > Are you saying that yieldInto is actually a *better* name? Or are you > just trying to avoid a possible collision? > > I like yieldInto, now there is perhaps a better name. R?mi From paul.sandoz at oracle.com Thu Dec 20 01:14:28 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 20 Dec 2012 10:14:28 +0100 Subject: Spliterator In-Reply-To: <50D1D167.3030801@cs.oswego.edu> References: <50D0BDE2.2040402@cs.oswego.edu> <50D0CB74.1040406@oracle.com> <50D1C323.2020707@cs.oswego.edu> <50D1CCDA.8080601@oracle.com> <50D1D167.3030801@cs.oswego.edu> Message-ID: On Dec 19, 2012, at 3:38 PM, Doug Lea

wrote: > > Do you have any existing examples of Spliterators that > return values other than 1/0? That might help. > In the source code there is something called a SpinedBuffer that has better resizing characteristics than ArrayList but only supports addition of elements (no removal of). SpinedBuffer holds an array of arrays (the spine), the size of individual arrays increases by some power of 2 as one goes down the spine. A Spliterator, that has not been split, of SpinedBuffer returns N - 1 for the natural splits when N > 1, where N is the number of arrays in the spine. Paul. From dl at cs.oswego.edu Thu Dec 20 06:53:20 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 20 Dec 2012 09:53:20 -0500 Subject: Spliterator In-Reply-To: References: <50D0BDE2.2040402@cs.oswego.edu> <50D0CB74.1040406@oracle.com> <50D1C323.2020707@cs.oswego.edu> <50D1CCDA.8080601@oracle.com> <50D1D167.3030801@cs.oswego.edu> Message-ID: <50D32660.2010304@cs.oswego.edu> On 12/20/12 04:14, Paul Sandoz wrote: > On Dec 19, 2012, at 3:38 PM, Doug Lea
wrote: >> >> Do you have any existing examples of Spliterators that return values other >> than 1/0? That might help. >> > > In the source code there is something called a SpinedBuffer that has better > resizing characteristics than ArrayList but only supports addition of > elements (no removal of). > > SpinedBuffer holds an array of arrays (the spine), the size of individual > arrays increases by some power of 2 as one goes down the spine. > > A Spliterator, that has not been split, of SpinedBuffer returns N - 1 for the > natural splits when N > 1, where N is the number of arrays in the spine. > This is a good illustration the kind of Spliterator that supporting getNaturalSplits encourages. The best performing spliterator for such a class would keep dividing the range of spines by two, and then if asked, divide each array within sub-splits. Doing this allows good scale-free performance. That is, it has a chance of working well no matter haw many cores you have. (This is why recursion and FJ are so heavily linked.) But the way it is now, given how this is supported in stream pipelines, it will blindly either over- or under- partition; possibly by enough to eliminate parallel speedups. If people want to write highly unbalanced spliterators, we cannot stop them. And in fact for hopelessly listy things like most Queues, the only plausible default is to split in the most imbalanced way possible, head::tail. But if the idea behind getNaturalSplits is to make it easier to implement a crummy Spliterator than a good one, then this adds to the reasons not to support this method. As I keep saying though, the main reason to prefer isSplittable is that you can write a a simple unambiguous spec for it. -Doug From brian.goetz at oracle.com Thu Dec 20 07:15:21 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 20 Dec 2012 10:15:21 -0500 Subject: Spliterator In-Reply-To: <50D32660.2010304@cs.oswego.edu> References: <50D0BDE2.2040402@cs.oswego.edu> <50D0CB74.1040406@oracle.com> <50D1C323.2020707@cs.oswego.edu> <50D1CCDA.8080601@oracle.com> <50D1D167.3030801@cs.oswego.edu> <50D32660.2010304@cs.oswego.edu> Message-ID: <50D32B89.5020802@oracle.com> > But if the idea behind getNaturalSplits is to make it > easier to implement a crummy Spliterator than a good one, > then this adds to the reasons not to support this > method. I think the idea is more oriented around allowing people other than Doug to write (admittedly suboptimal) spliterators for data structures for which the optimal binary split is not obvious. Case in point: TreeMap. A while back (when all we had was binary splits), there was a conversation that went something like this: Brian: TreeMap should fit nicely into a binary split model. Doug: Not so fast! There's data in the nodes. Which means you have to play a game where you decide which subtree gets the node data, and encode the path from the root as a bitmap indicating left-vs-right (which falls over completely on unbalanced trees.) Blech. Brian: Aw, crap. ... time passes ... Brian: Well, now that we have n-way splits, you can represent this trivially (though suboptimally) as (left node, my data, right node). Doug: Wait, I figured out a way to do it as binary splits. The moral of the story is: it took Doug+sidekick more than a a few days to come up with a clean binary splitting solution. Which means for many developers, they would never get there, and they're locked out of the parallelism game for want of a Spliterator. Stepping back, why do we have Spliterator at all? Using Spliterator at all has a cost -- it is a tradeoff of less efficient element access for abstraction. Highly tuned implementations like CHM do not benefit from Spliterator; the abstraction benefit is aimed at allowing arbitrary data structures (even non-thread-safe ones, assuming non-interference) to get the benefit of all these parallel algorithms at low entry cost. One goal here is to maximize the data structures that can get into the game relatively cheaply just by writing a Spliterator. > As I keep saying though, the main reason to prefer > isSplittable is that you can write a a simple > unambiguous spec for it. Will try to write something simpler and unambiguous, and Doug can poke holes in it. Maybe with a few iterations I can have my feature and Doug can have his simple spec, and everyone wins. From dl at cs.oswego.edu Thu Dec 20 07:29:22 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 20 Dec 2012 10:29:22 -0500 Subject: Spliterator In-Reply-To: <50D32B89.5020802@oracle.com> References: <50D0BDE2.2040402@cs.oswego.edu> <50D0CB74.1040406@oracle.com> <50D1C323.2020707@cs.oswego.edu> <50D1CCDA.8080601@oracle.com> <50D1D167.3030801@cs.oswego.edu> <50D32660.2010304@cs.oswego.edu> <50D32B89.5020802@oracle.com> Message-ID: <50D32ED2.1030807@cs.oswego.edu> On 12/20/12 10:15, Brian Goetz wrote: >> But if the idea behind getNaturalSplits is to make it >> easier to implement a crummy Spliterator than a good one, >> then this adds to the reasons not to support this >> method. > > I think the idea is more oriented around allowing people other than Doug to > write (admittedly suboptimal) spliterators for data structures for which the > optimal binary split is not obvious. Which is still completely possible under isSplittable. But if you cannot find a way to implement relatively balanced splits, then you will have to pay for it with more (annoying but easy) bookkeeping that keeps track of how many times split has been called. Which you'd have to track anyway. What could be better? -Doug From brian.goetz at oracle.com Thu Dec 20 09:40:23 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 20 Dec 2012 12:40:23 -0500 Subject: Getting rid of pull In-Reply-To: <50D1403F.4010802@oracle.com> References: <50D1403F.4010802@oracle.com> Message-ID: <50D34D87.4030101@oracle.com> The elimination of pull is complete. Close to 1000 lines of low-value code made the ultimate sacrifice in the effort. For short-circuit terminal ops (find, match) and iterator(), the way it works is we create an iterator as described in the initial mail on this thread. For short-circuit intermediate ops (limit), we've added an optional signalling mechanism to Sink to indicate "no mas", and the driver loop that pulls from the source iterator and pushes into the staging sink checks this flag before pushing more input. As a follow-on, this made a lot of simple ops (e.g., filter, map) so small that they could now be inlined into {Reference,Int}Pipeline, eliminating a lot of small classes. On 12/18/2012 11:19 PM, Brian Goetz wrote: > Currently, stream pipelines can operate in one of two modes: push or > pull. (This distinction is invisible to the user; the choice is made by > the library based on a number of factors.) > > Given a pipeline: > > stream.filter(...).map(...).reduce(...) > > A "pull" traversal involves taking an Iterator for the stream, wrapping > a filtering iterator around it, wrapping a mapping iterator around that, > and having the reduce loop pull elements from the upstream iterator and > accumulate the result. > > A "push" traversal involves creating a Sink for the reduction stage, > wrapping a mapping sink around that, wrapping a filtering sink around > that, and forEach'ing the elements from the source into the filtering > sink, and then asking the reducing sink at the end for its result. > > Currently, all operations are required to operated in both modes; to > wrap a Sink or an Iterator. > > Short-circuiting operations such as "findFirst" operate in "pull" mode, > so as to be usable and performant even on infinite streams. > > Most other operations operate in push mode, because it is more > efficient. Iterators have more per-element overhead; you have to call > two methods (hasNext and next), which generally must be defensively > coded, requiring duplicated activity between these two methods. Pull > mode also means more heap write traffic (and therefore bad interactions > with cardmarking) than push mode, since iterators are nearly always > stateful. So the above chain in "pull" mode requires three stateful > iterators (and the accumulation is done in a local), whereas in "push" > mode is two stateless sinks and a stateful sink. (Pipelines ending in > forEach are fully stateless, though the forEach target will have > side-effects.) So we try hard to use push mode wherever possible. > > Over the past few months, we've made a series of simplifications in the > stream model, including restrictions against stream reuse. This has > eliminated some of the use cases where pull was required, such as: > > Iterator it = stream.iterator(); > T first = it.next(); > stream.forEach(block); // process cdr of stream > > This used to be legal, and forced the forEach into pull mode because the > source iterator has already been partially consumed. Now this case (and > many of its relatives, such as forked streams) is illegal, eliminating > one of the use cases for pull. These simplifications enable us to > consider getting rid of direct support for pull entirely. If we > eliminate support for push in the framework, huge chunks of code dealing > with Iterators simply goes away. This is a really desirable outcome -- > iterator code is bulky and ugly and error-prone. > > > The remaining use cases for "pull" are: > > - Short-circuit operations, such as findFirst/findAny. > - Escape-hatch operations, whereby the client code requests an > Iterator (or Spliterator) for the stream. > > Examples of "escape hatch" usage might be "give me every second element" > or "give me the third element after the first blue element." While > these are important use cases to continue to support, they are likely to > be a small percentage of actual traversals. > > > If we got rid of support for iterator wrapping, we could still simulate > an Iterator as follows: > > - Construct a terminal sink which accumulates elements into a buffer > (such as that used in flatMap) > - Construct a push-oriented sink chain which flows into this buffer > - Get an Iterator for the source > - When more elements are needed, pull elements from the source, push > them into the sink, and continue until out of elements or until > something flows into the terminal buffering sink. Then consume elements > from the buffering sink until gone, and if there are more elements > remaining in the source, continue until the pipeline is dry. (Think of > this as a "pushme-pullyou adapter.") > > This has slightly higher overhead, but it seems pretty reasonable, and > doesn't get used that often. And then, all the code having to do with > wrapping iterators can go away, and iterators are then only used as > sources and escape hatches (and internally within short-circuit > operations.) Pretty nice. > > > We can even go one step farther, and eliminate iterators completely, but > it involves some ugliness. The ugliness involves use of an exception to > signal from a forEach target through the forEach source to the initiator > of the forEach to about dispensing of elements. (We can eliminate most > of the cost of the exception by not gathering the stack trace, which is > by far the most expensive part.) Then, Spliterator need not even > support an iterator() method, just a forEach method that doesn't fall > apart if you throw an exception through it. (Similarly, we don't expose > an iterator() method, just a version of forEach that lets the target say > "no mas" which causes the exception to be thrown.) This would eliminate > even more low-value code (and simplify the Spliterator spec) that the > above approach. But the exception thing is kind of stinky. > > > I don't see any reason to not do at least the first one. > From brian.goetz at oracle.com Fri Dec 21 09:50:57 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 21 Dec 2012 12:50:57 -0500 Subject: Into Message-ID: <50D4A181.3050300@oracle.com> I'm starting to dislike "into". First, it's the only stream method which retains mutable state from the user. That's not great. Second, the parallel story is bad. People are going to write list.parallel(e -> e+1).into(new ArrayList<>()); which will do a whole lot of trivial computation in parallel, wait on the barrier implicit in sequential(), and then do an O(n) serial thing. Third, the semantics are weird; we do this clever trick where collections have to decide whether to do insertion in serial or parallel. But as we all learned from Spinal Tap, there's a fine line between clever and stupid. Instead, we could treat this like a mutable reduce, where leaves are reduced to a List, and lists are merged as we go up the tree. Even with dumb merging is still going to be much faster than what we've got now; no barrier, no buffer the whole thing and copy, and the worst serial step is O(n/2) instead of O(n). So probably 3x better just by improving the serial fractions. But with a smarter combination step, we can do better still. If we have a "concatenated list view" operation (List concat(List a, List b)), which returns a read-only, conc-tree representation), then the big serial stage goes away. And, of course, building atop reduce makes the whole thing simpler; there are fewer ops that have their own distinct semantics, and the semantics of into() is about as weird as you get. Now that the tabulators framework gets users comfortable with the explicit choice between functional and concurrent aggregation for tabulation, it is a much shorter hop to get there. So let's build on that and find some sort of way to surface mutable and concurrent versions of "into". (Currently we have no good concurrent list-shaped collections, but hopefully that changes.) Something like: Stream.tabulate(collector(ArrayList::new)) Stream.tabulate(concurrentCollector(ConcurrentFooList::new)) Maybe with some rename of tabulate. I think there's a small reorganization of naming lurking here (involving tabulate, Tabulator, ConcurrentTabulator, MutableReducer, reduce) that recasts into() either as an explicit functional or concurrent tabulation. And one more tricky+slow special-purpose op bites the dust, in favor of something that builds on our two favorite primitives, fold (order-preserving) and forEach (not order-preserving.) From brian.goetz at oracle.com Fri Dec 21 12:53:56 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 21 Dec 2012 15:53:56 -0500 Subject: cumulate Message-ID: <50D4CC64.4090105@oracle.com> After an offline conversation with Doug, we're considering ditching cumulate from Streams. Reasons include: 1. Everybody looks at it and says WTF? And then has a YAGNI fit about throwing the kitchen sink into this API. 2. The form in which cumulation is exposed here -- stream in, stream out -- is really all that useful to algorithms that need it. It would be better to expose as an operation on Arrays instead. From kevinb at google.com Fri Dec 21 12:55:22 2012 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 21 Dec 2012 12:55:22 -0800 Subject: cumulate In-Reply-To: <50D4CC64.4090105@oracle.com> References: <50D4CC64.4090105@oracle.com> Message-ID: +1 (of course) On Fri, Dec 21, 2012 at 12:53 PM, Brian Goetz wrote: > After an offline conversation with Doug, we're considering ditching > cumulate from Streams. Reasons include: > > 1. Everybody looks at it and says WTF? And then has a YAGNI fit about > throwing the kitchen sink into this API. > > 2. The form in which cumulation is exposed here -- stream in, stream out > -- is really all that useful to algorithms that need it. It would be > better to expose as an operation on Arrays instead. > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121221/8eb3b7f7/attachment.html From forax at univ-mlv.fr Fri Dec 21 13:14:08 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 21 Dec 2012 22:14:08 +0100 Subject: cumulate In-Reply-To: References: <50D4CC64.4090105@oracle.com> Message-ID: <50D4D120.6070303@univ-mlv.fr> On 12/21/2012 09:55 PM, Kevin Bourrillion wrote: > +1 (of course) yes, +1 too. R?mi > > > On Fri, Dec 21, 2012 at 12:53 PM, Brian Goetz > wrote: > > After an offline conversation with Doug, we're considering > ditching cumulate from Streams. Reasons include: > > 1. Everybody looks at it and says WTF? And then has a YAGNI fit > about throwing the kitchen sink into this API. > > 2. The form in which cumulation is exposed here -- stream in, > stream out -- is really all that useful to algorithms that need > it. It would be better to expose as an operation on Arrays instead. > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From brian.goetz at oracle.com Fri Dec 21 13:31:19 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 21 Dec 2012 16:31:19 -0500 Subject: cumulate In-Reply-To: <50D4D120.6070303@univ-mlv.fr> References: <50D4CC64.4090105@oracle.com> <50D4D120.6070303@univ-mlv.fr> Message-ID: <50D4D527.1010102@oracle.com> It's gone. (Well, not gone. Mercurial history is still there.) I propose this as the replacement: In Arrays: void parallelPrefix(T[], int offset, int length, BinaryOperator); void parallelPrefix(int[], int offset, int length, IntBinaryOperator); void parallelPrefix(long[], int offset, int length, LongBinaryOperator); void parallelPrefix(double[], int offset, int length, DoubleBinaryOperator); plus trampolines for the offset=0, length=array.length case. Doug already has code that is almost identical to this. Maybe he will contribute it :) On 12/21/2012 4:14 PM, Remi Forax wrote: > On 12/21/2012 09:55 PM, Kevin Bourrillion wrote: >> +1 (of course) > > yes, +1 too. > > R?mi > >> >> >> On Fri, Dec 21, 2012 at 12:53 PM, Brian Goetz > > wrote: >> >> After an offline conversation with Doug, we're considering >> ditching cumulate from Streams. Reasons include: >> >> 1. Everybody looks at it and says WTF? And then has a YAGNI fit >> about throwing the kitchen sink into this API. >> >> 2. The form in which cumulation is exposed here -- stream in, >> stream out -- is really all that useful to algorithms that need >> it. It would be better to expose as an operation on Arrays instead. >> >> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> > From brian.goetz at oracle.com Fri Dec 21 13:47:57 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 21 Dec 2012 16:47:57 -0500 Subject: Into In-Reply-To: <50D4A181.3050300@oracle.com> References: <50D4A181.3050300@oracle.com> Message-ID: <50D4D90D.4060702@oracle.com> If we get rid of into(), and replace it with an explicit reduction [1], then we may be able to get rid of sequential() too. The primary use case for sequential() was for using non-thread-safe Collections as an into() target. The convoluted into() turns the problem around on the target, calling target.addAll(stream) on it, and the typical implementation of into() is like the one in Collection: default void addAll(Stream stream) { if (stream.isParallel()) stream = stream.sequential(); stream.forEach(this::add); } or more compactly default void addAll(Stream stream) { stream.sequential().forEach(this::add); } since sequential() is now a no-op on sequential streams. The code looks pretty, but the implementation is not; sequential() is a barrier, meaning you have to stop and collect all the elements into a temporary tree, and then dump them into the target. But it is not obvious that it is a barrier, so people will be surprised. (And on infinite streams, it's bad.) What used to be implicit in sequential() can now be made explicit with: if (stream.isParallel()) stream = stream...whateverWeCallOrderPreservingInto().stream() That offers similar semantics and at least as good performance, while also being more transparent and requiring one fewer weird stream operation. (I don't yet see the way to getting rid of .parallel(), but we can possibly move it out of Stream and into a static method Streams.parallel(stream), at some loss of discoverability. But we can discuss.) [1] Actually, we have to replace it with two explicit reductions, or more precisely, a reduction and a for-eaching. One is the pure reduction case that involves merging, and is suitable for non-thread-safe collections (and required if order preservation is desired); the other is the concurrent case, where we bombard a concurrent collection with puts and hope it manages to sort them out. The two are semantically very different; one is a reduce and the other is a forEach, and so they should have different manifestations in the code. Though there are really no concurrent Collections right now anyway (though you could fake a concurrent Set with a concurrent Map.) On 12/21/2012 12:50 PM, Brian Goetz wrote: > I'm starting to dislike "into". > > First, it's the only stream method which retains mutable state from the > user. That's not great. > > Second, the parallel story is bad. People are going to write > > list.parallel(e -> e+1).into(new ArrayList<>()); > > which will do a whole lot of trivial computation in parallel, wait on > the barrier implicit in sequential(), and then do an O(n) serial thing. > > Third, the semantics are weird; we do this clever trick where > collections have to decide whether to do insertion in serial or > parallel. But as we all learned from Spinal Tap, there's a fine line > between clever and stupid. > > Instead, we could treat this like a mutable reduce, where leaves are > reduced to a List, and lists are merged as we go up the tree. Even with > dumb merging is still going to be much faster than what we've got now; > no barrier, no buffer the whole thing and copy, and the worst serial > step is O(n/2) instead of O(n). So probably 3x better just by improving > the serial fractions. But with a smarter combination step, we can do > better still. If we have a "concatenated list view" operation (List > concat(List a, List b)), which returns a read-only, conc-tree > representation), then the big serial stage goes away. > > And, of course, building atop reduce makes the whole thing simpler; > there are fewer ops that have their own distinct semantics, and the > semantics of into() is about as weird as you get. > > > Now that the tabulators framework gets users comfortable with the > explicit choice between functional and concurrent aggregation for > tabulation, it is a much shorter hop to get there. So let's build on > that and find some sort of way to surface mutable and concurrent > versions of "into". (Currently we have no good concurrent list-shaped > collections, but hopefully that changes.) > > Something like: > > Stream.tabulate(collector(ArrayList::new)) > Stream.tabulate(concurrentCollector(ConcurrentFooList::new)) > > Maybe with some rename of tabulate. > > I think there's a small reorganization of naming lurking here (involving > tabulate, Tabulator, ConcurrentTabulator, MutableReducer, reduce) that > recasts into() either as an explicit functional or concurrent > tabulation. And one more tricky+slow special-purpose op bites the dust, > in favor of something that builds on our two favorite primitives, fold > (order-preserving) and forEach (not order-preserving.) > > From brian.goetz at oracle.com Fri Dec 21 14:13:54 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 21 Dec 2012 17:13:54 -0500 Subject: unordered() Message-ID: <50D4DF22.4070805@oracle.com> So, the move to a more explicit choice of merging or concurrent tabulation also reduces (heh) the need for unordered(), though it does not eliminate it completely. (Limit, cancelation, and duplicate removal all have optimized versions if encounter order is not significant.) Kevin pointed out that .unordered() is pretty easy to miss, and people will not know that they don't know about it. One possible is to make it more explicit at one end of the pipeline or the other (the only operation that is order-injecting is sorted(), and presumably if you are sorting you really care about encounter order for the downstream ops, otherwise the sort was a waste of time.) The proposed tabulator / reducer stuff makes the order-sensitivity clear at the tail end, which is a good place to put it -- the user should know whether a reduce or a forEach is what they want -- if not the user, who? (Only the user knows whether he cares about order or not, and knows whether his combination functions are commutative or not.) The other less-ignorable place to put an ordering opt-out is at the head; we could make things more clear with adding .parallelUnorderedStream() alongside .stream() and .parallelStream() The obvious implementation of parallelUnorderdStream is: default Stream parallelStream() { return stream().unordered(); } which is also the most efficient place to put the .unordered (at the head.) From dl at cs.oswego.edu Fri Dec 21 15:58:27 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 21 Dec 2012 18:58:27 -0500 Subject: cumulate In-Reply-To: <50D4D527.1010102@oracle.com> References: <50D4CC64.4090105@oracle.com> <50D4D120.6070303@univ-mlv.fr> <50D4D527.1010102@oracle.com> Message-ID: <50D4F7A3.5080009@cs.oswego.edu> On 12/21/12 16:31, Brian Goetz wrote: > > In Arrays: > void parallelPrefix(T[], int offset, int length, BinaryOperator); > void parallelPrefix(int[], int offset, int length, IntBinaryOperator); > void parallelPrefix(long[], int offset, int length, LongBinaryOperator); > void parallelPrefix(double[], int offset, int length, DoubleBinaryOperator); > The reason I initially suggested a name other than "parallelPrefix" is that the term is used with either of two subtly different variations; one computing (in place) the cumulation up to but not including each element, and the other including the element. Given the parameterization, our version can only be the latter. (Otherwise it would require an identity base and return total.) So I'm OK with it, but still slightly nervous. Example: [1, 2, 3, 4] -> modify in place to: v1 [0, 1, 3, 6] -> return 10 v2 [1, 3, 6, 10] -> void (our version.) Each of the versions is a little handier than the other for some purposes. But it is easy enough to adapt any usages of one to use the other. -Doug From paul.sandoz at oracle.com Sat Dec 22 06:59:14 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Sat, 22 Dec 2012 15:59:14 +0100 Subject: Into In-Reply-To: <50D4D90D.4060702@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> Message-ID: <4FEFF772-76CA-4817-97F5-B8DF61D6C63C@oracle.com> On Dec 21, 2012, at 10:47 PM, Brian Goetz wrote: > If we get rid of into(), and replace it with an explicit reduction [1], then we may be able to get rid of sequential() too. > > The primary use case for sequential() was for using non-thread-safe Collections as an into() target. The convoluted into() turns the problem around on the target, calling target.addAll(stream) on it, and the typical implementation of into() is like the one in Collection: > > default void addAll(Stream stream) { > if (stream.isParallel()) > stream = stream.sequential(); > stream.forEach(this::add); > } > > or more compactly > > default void addAll(Stream stream) { > stream.sequential().forEach(this::add); > } > > since sequential() is now a no-op on sequential streams. > > The code looks pretty, but the implementation is not; sequential() is a barrier, meaning you have to stop and collect all the elements into a temporary tree, and then dump them into the target. But it is not obvious that it is a barrier, so people will be surprised. (And on infinite streams, it's bad.) > Yes, it catches people out. > What used to be implicit in sequential() can now be made explicit with: > > if (stream.isParallel()) > stream = stream...whateverWeCallOrderPreservingInto().stream() > > That offers similar semantics and at least as good performance, while also being more transparent and requiring one fewer weird stream operation. > > (I don't yet see the way to getting rid of .parallel(), but we can possibly move it out of Stream and into a static method Streams.parallel(stream), at some loss of discoverability. But we can discuss.) > I was recently thinking about alternative solutions to "sequential()" to reduce the barrier. An alternative is to retain some form of sequential that is a partial barrier, retains left-to-right information and makes available data from leaf nodes in the correct order. We could use a special reducing task that pushes leaf nodes onto a blocking queue (perhaps a priority queue if we can provide a sequence identifier/string for each leaf node). A special spliterator could peek/pop off that blocking queue and that spliterator can be used as input to the new sequential stream. It would likely be more performant (no waiting until all leaf-nodes are processed) and less surprising to the developer (especially with inf streams) at the expense of some internal complexity for the reducer and spliterator pair. It all depends what the most common cases are to go sequential. If it is mostly a crutch to stuff things into a non-concurrent data structure then we can use the "reducers/tabulators" as you indicate and get rid of sequential, which suggests a refactoring of parallel(). Anyone got any uses cases for a sequential that is a partial barrier? Paul. From joe.bowbeer at gmail.com Sat Dec 22 08:55:20 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 22 Dec 2012 08:55:20 -0800 Subject: Into In-Reply-To: <50D4D90D.4060702@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> Message-ID: The main use case is sequential().forrEach(), which inserts any ol' for-loop into a computation. On Dec 21, 2012 1:48 PM, "Brian Goetz" wrote: > If we get rid of into(), and replace it with an explicit reduction [1], > then we may be able to get rid of sequential() too. > > The primary use case for sequential() was for using non-thread-safe > Collections as an into() target. The convoluted into() turns the problem > around on the target, calling target.addAll(stream) on it, and the typical > implementation of into() is like the one in Collection: > > default void addAll(Stream stream) { > if (stream.isParallel()) > stream = stream.sequential(); > stream.forEach(this::add); > } > > or more compactly > > default void addAll(Stream stream) { > stream.sequential().forEach(**this::add); > } > > since sequential() is now a no-op on sequential streams. > > The code looks pretty, but the implementation is not; sequential() is a > barrier, meaning you have to stop and collect all the elements into a > temporary tree, and then dump them into the target. But it is not obvious > that it is a barrier, so people will be surprised. (And on infinite > streams, it's bad.) > > What used to be implicit in sequential() can now be made explicit with: > > if (stream.isParallel()) > stream = stream...**whateverWeCallOrderPreservingI**nto().stream() > > That offers similar semantics and at least as good performance, while also > being more transparent and requiring one fewer weird stream operation. > > (I don't yet see the way to getting rid of .parallel(), but we can > possibly move it out of Stream and into a static method > Streams.parallel(stream), at some loss of discoverability. But we can > discuss.) > > > [1] Actually, we have to replace it with two explicit reductions, or more > precisely, a reduction and a for-eaching. One is the pure reduction case > that involves merging, and is suitable for non-thread-safe collections (and > required if order preservation is desired); the other is the concurrent > case, where we bombard a concurrent collection with puts and hope it > manages to sort them out. The two are semantically very different; one is a > reduce and the other is a forEach, and so they should have different > manifestations in the code. Though there are really no concurrent > Collections right now anyway (though you could fake a concurrent Set with a > concurrent Map.) > > > > On 12/21/2012 12:50 PM, Brian Goetz wrote: > >> I'm starting to dislike "into". >> >> First, it's the only stream method which retains mutable state from the >> user. That's not great. >> >> Second, the parallel story is bad. People are going to write >> >> list.parallel(e -> e+1).into(new ArrayList<>()); >> >> which will do a whole lot of trivial computation in parallel, wait on >> the barrier implicit in sequential(), and then do an O(n) serial thing. >> >> Third, the semantics are weird; we do this clever trick where >> collections have to decide whether to do insertion in serial or >> parallel. But as we all learned from Spinal Tap, there's a fine line >> between clever and stupid. >> >> Instead, we could treat this like a mutable reduce, where leaves are >> reduced to a List, and lists are merged as we go up the tree. Even with >> dumb merging is still going to be much faster than what we've got now; >> no barrier, no buffer the whole thing and copy, and the worst serial >> step is O(n/2) instead of O(n). So probably 3x better just by improving >> the serial fractions. But with a smarter combination step, we can do >> better still. If we have a "concatenated list view" operation (List >> concat(List a, List b)), which returns a read-only, conc-tree >> representation), then the big serial stage goes away. >> >> And, of course, building atop reduce makes the whole thing simpler; >> there are fewer ops that have their own distinct semantics, and the >> semantics of into() is about as weird as you get. >> >> >> Now that the tabulators framework gets users comfortable with the >> explicit choice between functional and concurrent aggregation for >> tabulation, it is a much shorter hop to get there. So let's build on >> that and find some sort of way to surface mutable and concurrent >> versions of "into". (Currently we have no good concurrent list-shaped >> collections, but hopefully that changes.) >> >> Something like: >> >> Stream.tabulate(collector(**ArrayList::new)) >> Stream.tabulate(**concurrentCollector(**ConcurrentFooList::new)) >> >> Maybe with some rename of tabulate. >> >> I think there's a small reorganization of naming lurking here (involving >> tabulate, Tabulator, ConcurrentTabulator, MutableReducer, reduce) that >> recasts into() either as an explicit functional or concurrent >> tabulation. And one more tricky+slow special-purpose op bites the dust, >> in favor of something that builds on our two favorite primitives, fold >> (order-preserving) and forEach (not order-preserving.) >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121222/d30299b0/attachment.html From brian.goetz at oracle.com Sat Dec 22 09:16:20 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 22 Dec 2012 12:16:20 -0500 Subject: Into In-Reply-To: References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> Message-ID: <50D5EAE4.2080208@oracle.com> Right. You want to do the upstream stuff in parallel, then you want to do the downstream stuff (a) serially, (b) in the current thread, and probably (c) in encounter order. So, assume for sake of discussion that we have some form of .toList(), whether as a "native" operation or some sort of reduce/combine/tabulate. Then you can say: parallelStream()...toList().forEach(...) and the list-building will happen in parallel and then forEach can help sequentially. Given that, is there any reason left for sequential()? On 12/22/2012 11:55 AM, Joe Bowbeer wrote: > The main use case is sequential().forrEach(), which inserts any ol' > for-loop into a computation. > > On Dec 21, 2012 1:48 PM, "Brian Goetz" > wrote: > > If we get rid of into(), and replace it with an explicit reduction > [1], then we may be able to get rid of sequential() too. > > The primary use case for sequential() was for using non-thread-safe > Collections as an into() target. The convoluted into() turns the > problem around on the target, calling target.addAll(stream) on it, > and the typical implementation of into() is like the one in Collection: > > default void addAll(Stream stream) { > if (stream.isParallel()) > stream = stream.sequential(); > stream.forEach(this::add); > } > > or more compactly > > default void addAll(Stream stream) { > stream.sequential().forEach(__this::add); > } > > since sequential() is now a no-op on sequential streams. > > The code looks pretty, but the implementation is not; sequential() > is a barrier, meaning you have to stop and collect all the elements > into a temporary tree, and then dump them into the target. But it > is not obvious that it is a barrier, so people will be surprised. > (And on infinite streams, it's bad.) > > What used to be implicit in sequential() can now be made explicit with: > > if (stream.isParallel()) > stream = stream...__whateverWeCallOrderPreservingI__nto().stream() > > That offers similar semantics and at least as good performance, > while also being more transparent and requiring one fewer weird > stream operation. > > (I don't yet see the way to getting rid of .parallel(), but we can > possibly move it out of Stream and into a static method > Streams.parallel(stream), at some loss of discoverability. But we > can discuss.) > > > [1] Actually, we have to replace it with two explicit reductions, or > more precisely, a reduction and a for-eaching. One is the pure > reduction case that involves merging, and is suitable for > non-thread-safe collections (and required if order preservation is > desired); the other is the concurrent case, where we bombard a > concurrent collection with puts and hope it manages to sort them > out. The two are semantically very different; one is a reduce and > the other is a forEach, and so they should have different > manifestations in the code. Though there are really no concurrent > Collections right now anyway (though you could fake a concurrent Set > with a concurrent Map.) > > > > On 12/21/2012 12:50 PM, Brian Goetz wrote: > > I'm starting to dislike "into". > > First, it's the only stream method which retains mutable state > from the > user. That's not great. > > Second, the parallel story is bad. People are going to write > > list.parallel(e -> e+1).into(new ArrayList<>()); > > which will do a whole lot of trivial computation in parallel, > wait on > the barrier implicit in sequential(), and then do an O(n) serial > thing. > > Third, the semantics are weird; we do this clever trick where > collections have to decide whether to do insertion in serial or > parallel. But as we all learned from Spinal Tap, there's a fine > line > between clever and stupid. > > Instead, we could treat this like a mutable reduce, where leaves are > reduced to a List, and lists are merged as we go up the tree. > Even with > dumb merging is still going to be much faster than what we've > got now; > no barrier, no buffer the whole thing and copy, and the worst serial > step is O(n/2) instead of O(n). So probably 3x better just by > improving > the serial fractions. But with a smarter combination step, we > can do > better still. If we have a "concatenated list view" operation (List > concat(List a, List b)), which returns a read-only, conc-tree > representation), then the big serial stage goes away. > > And, of course, building atop reduce makes the whole thing simpler; > there are fewer ops that have their own distinct semantics, and the > semantics of into() is about as weird as you get. > > > Now that the tabulators framework gets users comfortable with the > explicit choice between functional and concurrent aggregation for > tabulation, it is a much shorter hop to get there. So let's > build on > that and find some sort of way to surface mutable and concurrent > versions of "into". (Currently we have no good concurrent > list-shaped > collections, but hopefully that changes.) > > Something like: > > Stream.tabulate(collector(__ArrayList::new)) > Stream.tabulate(__concurrentCollector(__ConcurrentFooList::new)) > > Maybe with some rename of tabulate. > > I think there's a small reorganization of naming lurking here > (involving > tabulate, Tabulator, ConcurrentTabulator, MutableReducer, > reduce) that > recasts into() either as an explicit functional or concurrent > tabulation. And one more tricky+slow special-purpose op bites > the dust, > in favor of something that builds on our two favorite > primitives, fold > (order-preserving) and forEach (not order-preserving.) > > From forax at univ-mlv.fr Sat Dec 22 09:29:40 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 22 Dec 2012 18:29:40 +0100 Subject: Into In-Reply-To: <50D5EAE4.2080208@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> Message-ID: <50D5EE04.1010602@univ-mlv.fr> On 12/22/2012 06:16 PM, Brian Goetz wrote: > Right. You want to do the upstream stuff in parallel, then you want to > do the downstream stuff (a) serially, (b) in the current thread, and > probably (c) in encounter order. > > So, assume for sake of discussion that we have some form of .toList(), > whether as a "native" operation or some sort of > reduce/combine/tabulate. Then you can say: > > parallelStream()...toList().forEach(...) > > and the list-building will happen in parallel and then forEach can > help sequentially. > > Given that, is there any reason left for sequential()? You need also a toSet() and yes, in that case you don't need sequential() and it's better because it's more explicit. But there is still a lot of cases where you are sequential and you want to control the destination collection, that's why we need into(). I don't think it's a good idea to oppose toList() and into() both have their purposes. R?mi > > > > On 12/22/2012 11:55 AM, Joe Bowbeer wrote: >> The main use case is sequential().forrEach(), which inserts any ol' >> for-loop into a computation. >> >> On Dec 21, 2012 1:48 PM, "Brian Goetz" > > wrote: >> >> If we get rid of into(), and replace it with an explicit reduction >> [1], then we may be able to get rid of sequential() too. >> >> The primary use case for sequential() was for using non-thread-safe >> Collections as an into() target. The convoluted into() turns the >> problem around on the target, calling target.addAll(stream) on it, >> and the typical implementation of into() is like the one in >> Collection: >> >> default void addAll(Stream stream) { >> if (stream.isParallel()) >> stream = stream.sequential(); >> stream.forEach(this::add); >> } >> >> or more compactly >> >> default void addAll(Stream stream) { >> stream.sequential().forEach(__this::add); >> } >> >> since sequential() is now a no-op on sequential streams. >> >> The code looks pretty, but the implementation is not; sequential() >> is a barrier, meaning you have to stop and collect all the elements >> into a temporary tree, and then dump them into the target. But it >> is not obvious that it is a barrier, so people will be surprised. >> (And on infinite streams, it's bad.) >> >> What used to be implicit in sequential() can now be made explicit >> with: >> >> if (stream.isParallel()) >> stream = >> stream...__whateverWeCallOrderPreservingI__nto().stream() >> >> That offers similar semantics and at least as good performance, >> while also being more transparent and requiring one fewer weird >> stream operation. >> >> (I don't yet see the way to getting rid of .parallel(), but we can >> possibly move it out of Stream and into a static method >> Streams.parallel(stream), at some loss of discoverability. But we >> can discuss.) >> >> >> [1] Actually, we have to replace it with two explicit reductions, or >> more precisely, a reduction and a for-eaching. One is the pure >> reduction case that involves merging, and is suitable for >> non-thread-safe collections (and required if order preservation is >> desired); the other is the concurrent case, where we bombard a >> concurrent collection with puts and hope it manages to sort them >> out. The two are semantically very different; one is a reduce and >> the other is a forEach, and so they should have different >> manifestations in the code. Though there are really no concurrent >> Collections right now anyway (though you could fake a concurrent Set >> with a concurrent Map.) >> >> >> >> On 12/21/2012 12:50 PM, Brian Goetz wrote: >> >> I'm starting to dislike "into". >> >> First, it's the only stream method which retains mutable state >> from the >> user. That's not great. >> >> Second, the parallel story is bad. People are going to write >> >> list.parallel(e -> e+1).into(new ArrayList<>()); >> >> which will do a whole lot of trivial computation in parallel, >> wait on >> the barrier implicit in sequential(), and then do an O(n) serial >> thing. >> >> Third, the semantics are weird; we do this clever trick where >> collections have to decide whether to do insertion in serial or >> parallel. But as we all learned from Spinal Tap, there's a fine >> line >> between clever and stupid. >> >> Instead, we could treat this like a mutable reduce, where >> leaves are >> reduced to a List, and lists are merged as we go up the tree. >> Even with >> dumb merging is still going to be much faster than what we've >> got now; >> no barrier, no buffer the whole thing and copy, and the worst >> serial >> step is O(n/2) instead of O(n). So probably 3x better just by >> improving >> the serial fractions. But with a smarter combination step, we >> can do >> better still. If we have a "concatenated list view" >> operation (List >> concat(List a, List b)), which returns a read-only, conc-tree >> representation), then the big serial stage goes away. >> >> And, of course, building atop reduce makes the whole thing >> simpler; >> there are fewer ops that have their own distinct semantics, >> and the >> semantics of into() is about as weird as you get. >> >> >> Now that the tabulators framework gets users comfortable with >> the >> explicit choice between functional and concurrent aggregation >> for >> tabulation, it is a much shorter hop to get there. So let's >> build on >> that and find some sort of way to surface mutable and concurrent >> versions of "into". (Currently we have no good concurrent >> list-shaped >> collections, but hopefully that changes.) >> >> Something like: >> >> Stream.tabulate(collector(__ArrayList::new)) >> Stream.tabulate(__concurrentCollector(__ConcurrentFooList::new)) >> >> Maybe with some rename of tabulate. >> >> I think there's a small reorganization of naming lurking here >> (involving >> tabulate, Tabulator, ConcurrentTabulator, MutableReducer, >> reduce) that >> recasts into() either as an explicit functional or concurrent >> tabulation. And one more tricky+slow special-purpose op bites >> the dust, >> in favor of something that builds on our two favorite >> primitives, fold >> (order-preserving) and forEach (not order-preserving.) >> >> From brian.goetz at oracle.com Sat Dec 22 09:46:12 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 22 Dec 2012 12:46:12 -0500 Subject: Into In-Reply-To: <50D5EE04.1010602@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> Message-ID: <50D5F1E4.7020606@oracle.com> I was not proposing "let's replace into with toList". I already proposed in a previous message "let's replace into() with something whose semantics are more like reduce/forEach rather than the current weird semantics". The recent developments in reduce give us a consistent framework for doing that. Into is bad. It seemed clever at the time we first thought of it (turning the problem around on the target collection), but if you look at its semantics and its implementation, its complicated, it's neither-here-nor-there, its unlike any other op, and the obvious usages will likely parallelize terribly. Something that is more reduce-like (or forEach like with a concurrent collection, if we had one) will be much easier to understand AND will perform better. So the question is not "should we replace into with toList." We have to have a discussion about what we replace into() with. This query was about whether sequential() still has a use other than as a (bad) implementation crutch for into(). And it seems that there is none, which is nice -- one less weird operation, replaced with one more version of the reduce swiss-army-knife. If there is a toList() it will (obviously) be sugar on top of something more general. On 12/22/2012 12:29 PM, Remi Forax wrote: > On 12/22/2012 06:16 PM, Brian Goetz wrote: >> Right. You want to do the upstream stuff in parallel, then you want to >> do the downstream stuff (a) serially, (b) in the current thread, and >> probably (c) in encounter order. >> >> So, assume for sake of discussion that we have some form of .toList(), >> whether as a "native" operation or some sort of >> reduce/combine/tabulate. Then you can say: >> >> parallelStream()...toList().forEach(...) >> >> and the list-building will happen in parallel and then forEach can >> help sequentially. >> >> Given that, is there any reason left for sequential()? > > You need also a toSet() and yes, in that case you don't need > sequential() and it's better because it's more explicit. > But there is still a lot of cases where you are sequential and you want > to control the destination collection, > that's why we need into(). > > I don't think it's a good idea to oppose toList() and into() both have > their purposes. > > R?mi > >> >> >> >> On 12/22/2012 11:55 AM, Joe Bowbeer wrote: >>> The main use case is sequential().forrEach(), which inserts any ol' >>> for-loop into a computation. >>> >>> On Dec 21, 2012 1:48 PM, "Brian Goetz" >> > wrote: >>> >>> If we get rid of into(), and replace it with an explicit reduction >>> [1], then we may be able to get rid of sequential() too. >>> >>> The primary use case for sequential() was for using non-thread-safe >>> Collections as an into() target. The convoluted into() turns the >>> problem around on the target, calling target.addAll(stream) on it, >>> and the typical implementation of into() is like the one in >>> Collection: >>> >>> default void addAll(Stream stream) { >>> if (stream.isParallel()) >>> stream = stream.sequential(); >>> stream.forEach(this::add); >>> } >>> >>> or more compactly >>> >>> default void addAll(Stream stream) { >>> stream.sequential().forEach(__this::add); >>> } >>> >>> since sequential() is now a no-op on sequential streams. >>> >>> The code looks pretty, but the implementation is not; sequential() >>> is a barrier, meaning you have to stop and collect all the elements >>> into a temporary tree, and then dump them into the target. But it >>> is not obvious that it is a barrier, so people will be surprised. >>> (And on infinite streams, it's bad.) >>> >>> What used to be implicit in sequential() can now be made explicit >>> with: >>> >>> if (stream.isParallel()) >>> stream = >>> stream...__whateverWeCallOrderPreservingI__nto().stream() >>> >>> That offers similar semantics and at least as good performance, >>> while also being more transparent and requiring one fewer weird >>> stream operation. >>> >>> (I don't yet see the way to getting rid of .parallel(), but we can >>> possibly move it out of Stream and into a static method >>> Streams.parallel(stream), at some loss of discoverability. But we >>> can discuss.) >>> >>> >>> [1] Actually, we have to replace it with two explicit reductions, or >>> more precisely, a reduction and a for-eaching. One is the pure >>> reduction case that involves merging, and is suitable for >>> non-thread-safe collections (and required if order preservation is >>> desired); the other is the concurrent case, where we bombard a >>> concurrent collection with puts and hope it manages to sort them >>> out. The two are semantically very different; one is a reduce and >>> the other is a forEach, and so they should have different >>> manifestations in the code. Though there are really no concurrent >>> Collections right now anyway (though you could fake a concurrent Set >>> with a concurrent Map.) >>> >>> >>> >>> On 12/21/2012 12:50 PM, Brian Goetz wrote: >>> >>> I'm starting to dislike "into". >>> >>> First, it's the only stream method which retains mutable state >>> from the >>> user. That's not great. >>> >>> Second, the parallel story is bad. People are going to write >>> >>> list.parallel(e -> e+1).into(new ArrayList<>()); >>> >>> which will do a whole lot of trivial computation in parallel, >>> wait on >>> the barrier implicit in sequential(), and then do an O(n) serial >>> thing. >>> >>> Third, the semantics are weird; we do this clever trick where >>> collections have to decide whether to do insertion in serial or >>> parallel. But as we all learned from Spinal Tap, there's a fine >>> line >>> between clever and stupid. >>> >>> Instead, we could treat this like a mutable reduce, where >>> leaves are >>> reduced to a List, and lists are merged as we go up the tree. >>> Even with >>> dumb merging is still going to be much faster than what we've >>> got now; >>> no barrier, no buffer the whole thing and copy, and the worst >>> serial >>> step is O(n/2) instead of O(n). So probably 3x better just by >>> improving >>> the serial fractions. But with a smarter combination step, we >>> can do >>> better still. If we have a "concatenated list view" >>> operation (List >>> concat(List a, List b)), which returns a read-only, conc-tree >>> representation), then the big serial stage goes away. >>> >>> And, of course, building atop reduce makes the whole thing >>> simpler; >>> there are fewer ops that have their own distinct semantics, >>> and the >>> semantics of into() is about as weird as you get. >>> >>> >>> Now that the tabulators framework gets users comfortable with >>> the >>> explicit choice between functional and concurrent aggregation >>> for >>> tabulation, it is a much shorter hop to get there. So let's >>> build on >>> that and find some sort of way to surface mutable and concurrent >>> versions of "into". (Currently we have no good concurrent >>> list-shaped >>> collections, but hopefully that changes.) >>> >>> Something like: >>> >>> Stream.tabulate(collector(__ArrayList::new)) >>> Stream.tabulate(__concurrentCollector(__ConcurrentFooList::new)) >>> >>> Maybe with some rename of tabulate. >>> >>> I think there's a small reorganization of naming lurking here >>> (involving >>> tabulate, Tabulator, ConcurrentTabulator, MutableReducer, >>> reduce) that >>> recasts into() either as an explicit functional or concurrent >>> tabulation. And one more tricky+slow special-purpose op bites >>> the dust, >>> in favor of something that builds on our two favorite >>> primitives, fold >>> (order-preserving) and forEach (not order-preserving.) >>> >>> > From forax at univ-mlv.fr Sat Dec 22 09:55:56 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 22 Dec 2012 18:55:56 +0100 Subject: Into In-Reply-To: <50D5F1E4.7020606@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> Message-ID: <50D5F42C.9000006@univ-mlv.fr> On 12/22/2012 06:46 PM, Brian Goetz wrote: > I was not proposing "let's replace into with toList". I already > proposed in a previous message "let's replace into() with something > whose semantics are more like reduce/forEach rather than the current > weird semantics". The recent developments in reduce give us a > consistent framework for doing that. You want to replace into() with a kind of reducers and toList is a way to hide the implementation of that kind of reducers so ... > > Into is bad. It seemed clever at the time we first thought of it > (turning the problem around on the target collection), but if you look > at its semantics and its implementation, its complicated, it's > neither-here-nor-there, its unlike any other op, and the obvious > usages will likely parallelize terribly. Something that is more > reduce-like (or forEach like with a concurrent collection, if we had > one) will be much easier to understand AND will perform better. I understand why into() is bad when you call parallel() on the stream but it's also verify useful if you don't call parallel on a stream because at the boundary of methods people can use collections and are able to choose the implementation. > > So the question is not "should we replace into with toList." We have > to have a discussion about what we replace into() with. This query > was about whether sequential() still has a use other than as a (bad) > implementation crutch for into(). And it seems that there is none, > which is nice -- one less weird operation, replaced with one more > version of the reduce swiss-army-knife. > > If there is a toList() it will (obviously) be sugar on top of > something more general. and here you agree with me that toList() on a parallel stream is a way to abstract the implementation of some magic concurrent reducer. R?mi > > > On 12/22/2012 12:29 PM, Remi Forax wrote: >> On 12/22/2012 06:16 PM, Brian Goetz wrote: >>> Right. You want to do the upstream stuff in parallel, then you want to >>> do the downstream stuff (a) serially, (b) in the current thread, and >>> probably (c) in encounter order. >>> >>> So, assume for sake of discussion that we have some form of .toList(), >>> whether as a "native" operation or some sort of >>> reduce/combine/tabulate. Then you can say: >>> >>> parallelStream()...toList().forEach(...) >>> >>> and the list-building will happen in parallel and then forEach can >>> help sequentially. >>> >>> Given that, is there any reason left for sequential()? >> >> You need also a toSet() and yes, in that case you don't need >> sequential() and it's better because it's more explicit. >> But there is still a lot of cases where you are sequential and you want >> to control the destination collection, >> that's why we need into(). >> >> I don't think it's a good idea to oppose toList() and into() both have >> their purposes. >> >> R?mi >> >>> >>> >>> >>> On 12/22/2012 11:55 AM, Joe Bowbeer wrote: >>>> The main use case is sequential().forrEach(), which inserts any ol' >>>> for-loop into a computation. >>>> >>>> On Dec 21, 2012 1:48 PM, "Brian Goetz" >>> > wrote: >>>> >>>> If we get rid of into(), and replace it with an explicit reduction >>>> [1], then we may be able to get rid of sequential() too. >>>> >>>> The primary use case for sequential() was for using >>>> non-thread-safe >>>> Collections as an into() target. The convoluted into() turns the >>>> problem around on the target, calling target.addAll(stream) on it, >>>> and the typical implementation of into() is like the one in >>>> Collection: >>>> >>>> default void addAll(Stream stream) { >>>> if (stream.isParallel()) >>>> stream = stream.sequential(); >>>> stream.forEach(this::add); >>>> } >>>> >>>> or more compactly >>>> >>>> default void addAll(Stream stream) { >>>> stream.sequential().forEach(__this::add); >>>> } >>>> >>>> since sequential() is now a no-op on sequential streams. >>>> >>>> The code looks pretty, but the implementation is not; sequential() >>>> is a barrier, meaning you have to stop and collect all the >>>> elements >>>> into a temporary tree, and then dump them into the target. But it >>>> is not obvious that it is a barrier, so people will be surprised. >>>> (And on infinite streams, it's bad.) >>>> >>>> What used to be implicit in sequential() can now be made explicit >>>> with: >>>> >>>> if (stream.isParallel()) >>>> stream = >>>> stream...__whateverWeCallOrderPreservingI__nto().stream() >>>> >>>> That offers similar semantics and at least as good performance, >>>> while also being more transparent and requiring one fewer weird >>>> stream operation. >>>> >>>> (I don't yet see the way to getting rid of .parallel(), but we can >>>> possibly move it out of Stream and into a static method >>>> Streams.parallel(stream), at some loss of discoverability. But we >>>> can discuss.) >>>> >>>> >>>> [1] Actually, we have to replace it with two explicit >>>> reductions, or >>>> more precisely, a reduction and a for-eaching. One is the pure >>>> reduction case that involves merging, and is suitable for >>>> non-thread-safe collections (and required if order preservation is >>>> desired); the other is the concurrent case, where we bombard a >>>> concurrent collection with puts and hope it manages to sort them >>>> out. The two are semantically very different; one is a reduce and >>>> the other is a forEach, and so they should have different >>>> manifestations in the code. Though there are really no concurrent >>>> Collections right now anyway (though you could fake a >>>> concurrent Set >>>> with a concurrent Map.) >>>> >>>> >>>> >>>> On 12/21/2012 12:50 PM, Brian Goetz wrote: >>>> >>>> I'm starting to dislike "into". >>>> >>>> First, it's the only stream method which retains mutable state >>>> from the >>>> user. That's not great. >>>> >>>> Second, the parallel story is bad. People are going to write >>>> >>>> list.parallel(e -> e+1).into(new ArrayList<>()); >>>> >>>> which will do a whole lot of trivial computation in parallel, >>>> wait on >>>> the barrier implicit in sequential(), and then do an O(n) >>>> serial >>>> thing. >>>> >>>> Third, the semantics are weird; we do this clever trick where >>>> collections have to decide whether to do insertion in >>>> serial or >>>> parallel. But as we all learned from Spinal Tap, there's a >>>> fine >>>> line >>>> between clever and stupid. >>>> >>>> Instead, we could treat this like a mutable reduce, where >>>> leaves are >>>> reduced to a List, and lists are merged as we go up the tree. >>>> Even with >>>> dumb merging is still going to be much faster than what we've >>>> got now; >>>> no barrier, no buffer the whole thing and copy, and the worst >>>> serial >>>> step is O(n/2) instead of O(n). So probably 3x better just by >>>> improving >>>> the serial fractions. But with a smarter combination step, we >>>> can do >>>> better still. If we have a "concatenated list view" >>>> operation (List >>>> concat(List a, List b)), which returns a read-only, conc-tree >>>> representation), then the big serial stage goes away. >>>> >>>> And, of course, building atop reduce makes the whole thing >>>> simpler; >>>> there are fewer ops that have their own distinct semantics, >>>> and the >>>> semantics of into() is about as weird as you get. >>>> >>>> >>>> Now that the tabulators framework gets users comfortable with >>>> the >>>> explicit choice between functional and concurrent aggregation >>>> for >>>> tabulation, it is a much shorter hop to get there. So let's >>>> build on >>>> that and find some sort of way to surface mutable and >>>> concurrent >>>> versions of "into". (Currently we have no good concurrent >>>> list-shaped >>>> collections, but hopefully that changes.) >>>> >>>> Something like: >>>> >>>> Stream.tabulate(collector(__ArrayList::new)) >>>> Stream.tabulate(__concurrentCollector(__ConcurrentFooList::new)) >>>> >>>> Maybe with some rename of tabulate. >>>> >>>> I think there's a small reorganization of naming lurking here >>>> (involving >>>> tabulate, Tabulator, ConcurrentTabulator, MutableReducer, >>>> reduce) that >>>> recasts into() either as an explicit functional or concurrent >>>> tabulation. And one more tricky+slow special-purpose op bites >>>> the dust, >>>> in favor of something that builds on our two favorite >>>> primitives, fold >>>> (order-preserving) and forEach (not order-preserving.) >>>> >>>> >> From dl at cs.oswego.edu Sat Dec 22 10:55:02 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 22 Dec 2012 13:55:02 -0500 Subject: cumulate In-Reply-To: <50D4D527.1010102@oracle.com> References: <50D4CC64.4090105@oracle.com> <50D4D120.6070303@univ-mlv.fr> <50D4D527.1010102@oracle.com> Message-ID: <50D60206.5070503@cs.oswego.edu> On 12/21/12 16:31, Brian Goetz wrote: > It's gone. (Well, not gone. Mercurial history is still there.) > > I propose this as the replacement: > > In Arrays: > void parallelPrefix(T[], int offset, int length, BinaryOperator); > void parallelPrefix(int[], int offset, int length, IntBinaryOperator); > void parallelPrefix(long[], int offset, int length, LongBinaryOperator); > void parallelPrefix(double[], int offset, int length, DoubleBinaryOperator); > Actually, to be consistent with Arrays.sort (and other in-place methods in Arrays), it should use fromIndex, toIndex. The "T" and long versions pasted below. After Brian grabs and commits some stuff, it should all be in place.... -Doug /** * Cumulates in parallel each element of the given array in place, * using the supplied function. For example if the array initially * holds {@code [2, 1, 0, 3]} and the operation performs addition, * then upon return the array holds {@code [2, 3, 3, 6]}. * Parallel prefix computation is usually more efficient than * sequential loops for large arrays. * * @param array the array, which is modified in-place by this method * @param op the function to perform cumulations. The function * must be amenable to left-to-right application through the * elements of the array, as well as possible left-to-right * application across segments of the array. */ public static void parallelPrefix(T[] array, BinaryOperator op) { if (array.length > 0) new ArrayPrefixUtil.CumulateTask (null, op, array, 0, array.length).invoke(); } /** * Performs {@link #parallelPrefix(Object[], BinaryOperator)} * for the given subrange of the array. * * @param array the array * @param fromIndex the index of the first element, inclusive * @param toIndex the index of the last element, exclusive * @param op the function to perform cumulations. * @throws IllegalArgumentException if {@code fromIndex > toIndex} * @throws ArrayIndexOutOfBoundsException * if {@code fromIndex < 0} or {@code toIndex > array.length} */ public static void parallelPrefix(T[] array, int fromIndex, int toIndex, BinaryOperator op) { checkFromToBounds(array.length, fromIndex, toIndex); if (fromIndex < toIndex) new ArrayPrefixUtil.CumulateTask (null, op, array, fromIndex, toIndex).invoke(); } /** * Cumulates in parallel each element of the given array in place, * using the supplied function. For example if the array initially * holds {@code [2, 1, 0, 3]} and the operation performs addition, * then upon return the array holds {@code [2, 3, 3, 6]}. * Parallel prefix computation is usually more efficient than * sequential loops for large arrays. * * @param array the array, which is modified in-place by this method * @param op the function to perform cumulations. The function * must be amenable to left-to-right application through the * elements of the array, as well as possible left-to-right * application across segments of the array. */ public static void parallelPrefix(long[] array, LongBinaryOperator op) { if (array.length > 0) new ArrayPrefixUtil.LongCumulateTask (null, op, array, 0, array.length).invoke(); } ... and others similarly... From brian.goetz at oracle.com Sat Dec 22 11:22:23 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 22 Dec 2012 14:22:23 -0500 Subject: cumulate In-Reply-To: <50D60206.5070503@cs.oswego.edu> References: <50D4CC64.4090105@oracle.com> <50D4D120.6070303@univ-mlv.fr> <50D4D527.1010102@oracle.com> <50D60206.5070503@cs.oswego.edu> Message-ID: <50D6086F.40901@oracle.com> Right, also we've added a few other methods in Arrays which should also be made consistent with that, and requiring some adjustments elsewhere. What's left to grab and commit? I committed everything you sent this week so far. On 12/22/2012 1:55 PM, Doug Lea wrote: > On 12/21/12 16:31, Brian Goetz wrote: >> It's gone. (Well, not gone. Mercurial history is still there.) >> >> I propose this as the replacement: >> >> In Arrays: >> void parallelPrefix(T[], int offset, int length, BinaryOperator); >> void parallelPrefix(int[], int offset, int length, IntBinaryOperator); >> void parallelPrefix(long[], int offset, int length, >> LongBinaryOperator); >> void parallelPrefix(double[], int offset, int length, >> DoubleBinaryOperator); >> > > Actually, to be consistent with Arrays.sort (and other in-place methods > in Arrays), it should use fromIndex, toIndex. The "T" and long versions > pasted below. After Brian grabs and commits some stuff, it should all be in > place.... > > -Doug > > > > /** > * Cumulates in parallel each element of the given array in place, > * using the supplied function. For example if the array initially > * holds {@code [2, 1, 0, 3]} and the operation performs addition, > * then upon return the array holds {@code [2, 3, 3, 6]}. > * Parallel prefix computation is usually more efficient than > * sequential loops for large arrays. > * > * @param array the array, which is modified in-place by this method > * @param op the function to perform cumulations. The function > * must be amenable to left-to-right application through the > * elements of the array, as well as possible left-to-right > * application across segments of the array. > */ > public static void parallelPrefix(T[] array, BinaryOperator > op) { > if (array.length > 0) > new ArrayPrefixUtil.CumulateTask > (null, op, array, 0, array.length).invoke(); > } > > /** > * Performs {@link #parallelPrefix(Object[], BinaryOperator)} > * for the given subrange of the array. > * > * @param array the array > * @param fromIndex the index of the first element, inclusive > * @param toIndex the index of the last element, exclusive > * @param op the function to perform cumulations. > * @throws IllegalArgumentException if {@code fromIndex > toIndex} > * @throws ArrayIndexOutOfBoundsException > * if {@code fromIndex < 0} or {@code toIndex > array.length} > */ > public static void parallelPrefix(T[] array, int fromIndex, > int toIndex, > BinaryOperator op) { > checkFromToBounds(array.length, fromIndex, toIndex); > if (fromIndex < toIndex) > new ArrayPrefixUtil.CumulateTask > (null, op, array, fromIndex, toIndex).invoke(); > } > > > /** > * Cumulates in parallel each element of the given array in place, > * using the supplied function. For example if the array initially > * holds {@code [2, 1, 0, 3]} and the operation performs addition, > * then upon return the array holds {@code [2, 3, 3, 6]}. > * Parallel prefix computation is usually more efficient than > * sequential loops for large arrays. > * > * @param array the array, which is modified in-place by this method > * @param op the function to perform cumulations. The function > * must be amenable to left-to-right application through the > * elements of the array, as well as possible left-to-right > * application across segments of the array. > */ > public static void parallelPrefix(long[] array, LongBinaryOperator > op) { > if (array.length > 0) > new ArrayPrefixUtil.LongCumulateTask > (null, op, array, 0, array.length).invoke(); > } > > ... and others similarly... > > From dl at cs.oswego.edu Sat Dec 22 12:21:33 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 22 Dec 2012 15:21:33 -0500 Subject: Into In-Reply-To: <50D5F1E4.7020606@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> Message-ID: <50D6164D.40307@cs.oswego.edu> I'm happy to see "into" go. But I'm not all the way sold on whether the best way is via more abstraction (Reducers, Tabulators, etc) versus less. Here's the "less" version. I don't want to argue too hard for taking this approach, but figure that is worth describing: For mutative additions to collections/maps, why not just let people use either forEach(...add...) or parallel-forEach(...add..), depending on whether the destination is concurrent and the source amenable to parallelism, and/or whether, in the case of Maps, they want put vs putIfAbsent vs merge. The idioms are easy, and clearly reflect mutative intent. (*) For more functional/fluent/streamy usages, you'd like to enforce that any into-ish method creates a fresh instance and that construction cannot interfere with anything else. So why not treat these as factories in existing catgories: toCollection(), toList(), toRandomAccessList() toSet(), toNavigableSet(); plus grouping versions toMap(keyfn, ...), toNavigableMap(keyFn, ...); (with merge- and multi- variants.) Here the streams implementation gets to pick the underlying concrete class. This may entail sensing parallelism but only in the case of toList is constrained by orderedness. The streams implementation could pick the best applicable options and improve them or pick others over time. (For example, initially,, all of toCollection(), toList(), and toRandomAccessList() could pick ArrayList.) The implementation can also take advantage of the fact that some collections (especially ArrayList) support fast parallel insertion upon creation but not once published. People who don't like the choices of concrete classes can instead use the first option of creating some concrete collections and populating them. Summary: Replace into with: * manual forEach-based idioms for mutative stuff * opaque factory-based methods for fluent/function stuff And triage out for now any other non-collections-based into targets (*) footnote: There are now a few more options for parallel insertions into concurrent collections. A while ago I added a "newKeySet" factory method to JDK8/V8 version of CHM, so there is now a JDK Concurrent Set implementation that people can use. Someday similarly for skip lists, so there will be a concurrent sorted/navigable set. High-performance concurrent Lists are unlikely any time soon, although ReadMostlyVector is better than nothing. (I'm still not sure if it should move from jsr166e.extra into JDK...). -Doug From brian.goetz at oracle.com Sat Dec 22 14:15:37 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 22 Dec 2012 17:15:37 -0500 Subject: Into In-Reply-To: <50D6164D.40307@cs.oswego.edu> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> Message-ID: <50D63109.8030905@oracle.com> > For more functional/fluent/streamy usages, you'd like > to enforce that any into-ish method creates a fresh instance > and that construction cannot interfere with anything else. +1. (This was my point #1 from the original note.) > So why not treat these as factories in existing catgories: > toCollection(), toList(), toRandomAccessList() > toSet(), toNavigableSet(); > plus grouping versions > toMap(keyfn, ...), toNavigableMap(keyFn, ...); I like the convenience of these -- they meet the needs of many users without forcing people to learn about the tabulators framework (or whatever we call it). And also the flexibility of being able to provide better implementations over time. But I prefer they be sugar on top of something more general. So let's not let the surface simplicity of the simple case stop us from from addressing the general case. (One thing I don't love is that they bring List/Map back into the API -- after all the work we did in groupBy to move it out. But this is more superficial than what groupBy/reduceBy did, so I'm probably OK with this -- especially if they are just sugar.) > The implementation can also take advantage of the fact > that some collections (especially ArrayList) support fast > parallel insertion upon creation but not once published. Though unfortunately relies on implementation details rather than the spec for ArrayList. From forax at univ-mlv.fr Sat Dec 22 14:24:23 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 22 Dec 2012 23:24:23 +0100 Subject: Into In-Reply-To: <50D6164D.40307@cs.oswego.edu> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> Message-ID: <50D63317.7020306@univ-mlv.fr> On 12/22/2012 09:21 PM, Doug Lea wrote: > > I'm happy to see "into" go. But I'm not all the way sold on whether > the best way is via more abstraction (Reducers, Tabulators, etc) > versus less. Here's the "less" version. > I don't want to argue too hard for taking this approach, > but figure that is worth describing: > > For mutative additions to collections/maps, why not > just let people use either forEach(...add...) or > parallel-forEach(...add..), depending on > whether the destination is concurrent and the source > amenable to parallelism, and/or whether, in the case of > Maps, they want put vs putIfAbsent vs merge. The idioms > are easy, and clearly reflect mutative intent. (*) Your asking people to take car about the concurrency in their code instead of letting the pipeline taking care of that. While it should be possible, that's why there is a forEach, it should not be the default idiom because people will write ArrayList list = new ArrayList<>(); list.parallel().filter(...).forEach(list::add); We should prefer a slow into() to a fast forEach() that only works if your name is Doug or Brian. > > For more functional/fluent/streamy usages, you'd like > to enforce that any into-ish method creates a fresh instance > and that construction cannot interfere with anything else. > So why not treat these as factories in existing catgories: > toCollection(), toList(), toRandomAccessList() > toSet(), toNavigableSet(); > plus grouping versions > toMap(keyfn, ...), toNavigableMap(keyFn, ...); > (with merge- and multi- variants.) > > Here the streams implementation gets to pick the > underlying concrete class. This may entail sensing > parallelism but only in the case of toList is > constrained by orderedness. > The streams implementation could pick the best applicable > options and improve them or pick others over time. > (For example, initially,, all of toCollection(), toList(), > and toRandomAccessList() could pick ArrayList.) > The implementation can also take advantage of the fact > that some collections (especially ArrayList) support fast > parallel insertion upon creation but not once published. you can't use arrayList.addAll(arrayList2) ? > > People who don't like the choices of concrete classes > can instead use the first option of creating some concrete > collections and populating them. > > Summary: Replace into with: > * manual forEach-based idioms for mutative stuff > * opaque factory-based methods for fluent/function stuff > And triage out for now any other non-collections-based into targets on things that can be done is to write into that way: into(Supplier supplier) and used that way: stream.into(ArrayList::new) The implementation of the pipeline can check if the Supplier is a constructor reference to a well known collections of the JDK and optimize. But in that case, users can not use Collections.newSetFromMap. > > (*) footnote: There are now a few more options for > parallel insertions into concurrent collections. > A while ago I added a "newKeySet" factory > method to JDK8/V8 version of CHM, so > there is now a JDK Concurrent Set implementation > that people can use. Someday similarly for skip lists, > so there will be a concurrent sorted/navigable set. > High-performance concurrent Lists are unlikely any > time soon, although ReadMostlyVector is better > than nothing. (I'm still not sure if it should move > from jsr166e.extra into JDK...). > > -Doug > > R?mi From dl at cs.oswego.edu Sun Dec 23 04:02:27 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 23 Dec 2012 07:02:27 -0500 Subject: Into In-Reply-To: <50D63317.7020306@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> Message-ID: <50D6F2D3.1030409@cs.oswego.edu> On 12/22/12 17:24, Remi Forax wrote: > On 12/22/2012 09:21 PM, Doug Lea wrote: >> For mutative additions to collections/maps, why not >> just let people use either forEach(...add...) or >> parallel-forEach(...add..), > > Your asking people to take car about the concurrency in their code instead of > letting the pipeline taking care of that. Only for mutative updates, for which they will need to take the same care in any choice of seq vs par for any use of forEach. So there is nothing much special/interesting about this. The main idea is to be uniform about how mutative constructions are less fluent/streamy-looking than functional usages. >> >> For more functional/fluent/streamy usages, you'd like >> to enforce that any into-ish method creates a fresh instance >> and that construction cannot interfere with anything else. >> So why not treat these as factories in existing catgories: >> toCollection(), toList(), toRandomAccessList() >> toSet(), toNavigableSet(); >> plus grouping versions >> toMap(keyfn, ...), toNavigableMap(keyFn, ...); >> (with merge- and multi- variants.) >>ion upon creation but not once published. > ... > you can't use arrayList.addAll(arrayList2) ? Only if ArrayList had a build-in-parallel method/mode. Which it could. -Doug From dl at cs.oswego.edu Sun Dec 23 04:51:01 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 23 Dec 2012 07:51:01 -0500 Subject: Into In-Reply-To: <50D63109.8030905@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63109.8030905@oracle.com> Message-ID: <50D6FE35.2000400@cs.oswego.edu> On 12/22/12 17:15, Brian Goetz wrote: > (One thing I don't love is that they bring List/Map back into the API -- after > all the work we did in groupBy to move it out. But this is more superficial > than what groupBy/reduceBy did, so I'm probably OK with this -- especially if > they are just sugar.) My (long-standing :-) concern is the potential bug-tail of methods that introduce one more level of indirection on the already intractable binary-method problem. Method into() is asked to automagically resolve any mismatches between properties of the source and of the destination; where the properties mainly include ordered and parallel, but also keyed, sorted, indexed, random-access. The Collections framework is not especially good at helping you out here. One possibility for reducing complexity is to limit the number of cases, via toList, toSet, etc and allow the implementation to rely on particular properties of the internally chosen destination classes. This is not very exciting, but it also does not rule out doing something more ambitious later. Another strategy (a variant of one discussed before) for collapsing one side of the binary method problem would be to somehow require that concrete Collection/Map classes support a stream-based constructor. This encounters a couple of snags though: (1) We can't require or default-define constructors, so would have to live with say void buildFromStream(Stream x) (default: if !empty die else sequential-forEach-add). (2) Usages lose the fluency look-and-feel, which bothers some people a lot: Usages must declare and use the destination in a separate statement than the pipleline expression. I'd probably prefer this approach, in part because it does fully remove collections from the stream framework (by adding stream support to the collections framework, which we must do anyway.). But I realize that the fluent-API expectations for streams make it an unpopular choice. Footnote: Recall that Scala deals with the binary-method-problem aspects by encoding some of these constraints in the type system and resolving a match. Which is sorta cool but still produces anomalies. -Doug From dl at cs.oswego.edu Sun Dec 23 06:35:03 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 23 Dec 2012 09:35:03 -0500 Subject: Into In-Reply-To: <50D6F2D3.1030409@cs.oswego.edu> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> Message-ID: <50D71697.5020008@cs.oswego.edu> On 12/23/12 07:02, Doug Lea wrote: > >> Your asking people to take car about the concurrency in their code instead of >> letting the pipeline taking care of that. > > Only for mutative updates, for which they will need to take > the same care in any choice of seq vs par for any use of forEach. > So there is nothing much special/interesting about this. > The main idea is to be uniform about how mutative constructions > are less fluent/streamy-looking than functional usages. > (To continue to re-open old wounds :-) For extra Bondage&Discipline/friendly-guidance, we could always re-choose to separately support forEach and parallelForEach methods and get rid of implicit moding for forEach. Implicit moding can never hurt you in this sense for the functional/stateless operations. There are still several other stateful ones that would require some similar separation though. Which might be a variant of what Brian was suggesting a few days ago? -Doug From brian.goetz at oracle.com Sun Dec 23 07:33:18 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 23 Dec 2012 10:33:18 -0500 Subject: Into In-Reply-To: <50D71697.5020008@cs.oswego.edu> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> Message-ID: <50D7243E.5000803@oracle.com> It is sort of a variant of what I've been suggesting, though not quite going as far as serialForEach/parallelForEach. In my view the modes that need to be explicitly chosen between are reduce/functional/associative/order-sensitive vs forEach/mutative/commutative/order-insensitive (This distinction only makes a difference in the parallel case, so it is slightly different from the forEach/forEachParallel distinction, but in a similar spirit of "less mind reading.") On 12/23/2012 9:35 AM, Doug Lea wrote: > On 12/23/12 07:02, Doug Lea wrote: >> >>> Your asking people to take car about the concurrency in their code >>> instead of >>> letting the pipeline taking care of that. >> >> Only for mutative updates, for which they will need to take >> the same care in any choice of seq vs par for any use of forEach. >> So there is nothing much special/interesting about this. >> The main idea is to be uniform about how mutative constructions >> are less fluent/streamy-looking than functional usages. >> > > (To continue to re-open old wounds :-) > For extra Bondage&Discipline/friendly-guidance, we could always > re-choose to separately support forEach and parallelForEach methods > and get rid of implicit moding for forEach. > Implicit moding can never hurt you in this sense > for the functional/stateless operations. There are still > several other stateful ones that would require some similar > separation though. > > Which might be a variant of what Brian was suggesting a few days ago? > > > -Doug > From brian.goetz at oracle.com Sun Dec 23 09:44:48 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 23 Dec 2012 12:44:48 -0500 Subject: Foo.Of{Int,Long,Double} naming convention Message-ID: <50D74310.3090709@oracle.com> For types that have primitive specializations that are subtypes of the base type (e.g., MutableReducer), we've been converging on a naming convention that puts the subtypes as nested interfaces. For example: interface MutableReducer { // reducer methods interface OfInt extends MutableReducer { ... } interface OfLong extends MutableReducer { ... } } The motivation here is (a) reduce the javadoc surface area, (b) groups related abstractions together, (c) makes it clear that these are subsidiary abstractions, and (d) keep the cut-and-paste stuff together in the code. The use site also looks pretty reasonable: class Foo implements MutableReducer.OfInt { ... } This shows up in Sink, IntermediateOp, TerminalOp, MutableReducer, NodeBuilder, Node, Spliterator, etc. (It also shows up in concrete implementation classes like ForEachOp, MatchOp, and FindOp, though these will not be public and so (a) doesn't really apply to these.) Are we OK with this convention? It seems to have a tidying effect on the codebase, the documentation, and client usage, and mitigates the pain of the primitive specializations. (There will be grey areas where its applicability is questionable; we can discuss those individually, but there are a lot of things that it works for.) From tim at peierls.net Sun Dec 23 09:54:03 2012 From: tim at peierls.net (Tim Peierls) Date: Sun, 23 Dec 2012 12:54:03 -0500 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: <50D74310.3090709@oracle.com> References: <50D74310.3090709@oracle.com> Message-ID: On Sun, Dec 23, 2012 at 12:44 PM, Brian Goetz wrote: > [E.g.:] class Foo implements MutableReducer.OfInt { ... } > > Are we OK with this convention? I'm more than OK with it. --tim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121223/ba1f43d6/attachment.html From dl at cs.oswego.edu Mon Dec 24 04:27:34 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 24 Dec 2012 07:27:34 -0500 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: <50D74310.3090709@oracle.com> References: <50D74310.3090709@oracle.com> Message-ID: <50D84A36.9020906@cs.oswego.edu> On 12/23/12 12:44, Brian Goetz wrote: > For types that have primitive specializations that are subtypes of the base type > (e.g., MutableReducer), we've been converging on a naming convention that puts > the subtypes as nested interfaces. For example: > > interface MutableReducer { > > // reducer methods > > interface OfInt extends MutableReducer { ... } > interface OfLong extends MutableReducer { ... } > } > > > Are we OK with this convention? I'm not positive that it will hold up well without further conventions, since it encounters the same overload/box issues that led to different conventions for plain function types. Consider using this for Function: interface Function { R apply (A a); interface OfInt extends Function { interface Function { int apply(A a); // nope } } } Which didn't quite work. One way out was equivalent to: interface Function { R apply (A a); interface OfInt extends Function { interface Function { default Integer apply(A a) { return applyAsInt(a); } int applyAsInt(A a); } } } Which is itself a little fragile. The current versions of Function etc don't bother to declare any subtyping. But if it is the best we can do, the "asInt" ought to be appended to all similar methods in ofInt interfaces? -Doug From forax at univ-mlv.fr Mon Dec 24 07:47:41 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 24 Dec 2012 16:47:41 +0100 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: <50D74310.3090709@oracle.com> References: <50D74310.3090709@oracle.com> Message-ID: <50D8791D.4060907@univ-mlv.fr> On 12/23/2012 06:44 PM, Brian Goetz wrote: > For types that have primitive specializations that are subtypes of the > base type (e.g., MutableReducer), we've been converging on a naming > convention that puts the subtypes as nested interfaces. For example: > > interface MutableReducer { > > // reducer methods > > interface OfInt extends MutableReducer { ... } > interface OfLong extends MutableReducer { ... } > } > > The motivation here is (a) reduce the javadoc surface area, (b) groups > related abstractions together, (c) makes it clear that these are > subsidiary abstractions, and (d) keep the cut-and-paste stuff together > in the code. The use site also looks pretty reasonable: > > class Foo implements MutableReducer.OfInt { ... } > > This shows up in Sink, IntermediateOp, TerminalOp, MutableReducer, > NodeBuilder, Node, Spliterator, etc. (It also shows up in concrete > implementation classes like ForEachOp, MatchOp, and FindOp, though > these will not be public and so (a) doesn't really apply to these.) > > Are we OK with this convention? It seems to have a tidying effect on > the codebase, the documentation, and client usage, and mitigates the > pain of the primitive specializations. (There will be grey areas > where its applicability is questionable; we can discuss those > individually, but there are a lot of things that it works for.) First, it's far from clear fro me that MutableReducer.OfInt should inherit from MutableReducer. Now on the naming, you forget that you can write import java.util.function.MutableReducer.OfInt; in that case, OfInt function; is unreadable but fully valid. Also Eclipse does that the import automagically if you ask it to declare a local variable using CTRL+SHIFT+1. R?mi From tim at peierls.net Mon Dec 24 08:09:46 2012 From: tim at peierls.net (Tim Peierls) Date: Mon, 24 Dec 2012 11:09:46 -0500 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: <50D8791D.4060907@univ-mlv.fr> References: <50D74310.3090709@oracle.com> <50D8791D.4060907@univ-mlv.fr> Message-ID: On Mon, Dec 24, 2012 at 10:47 AM, Remi Forax wrote: > On 12/23/2012 06:44 PM, Brian Goetz wrote: > >> Now on the naming, you forget that you can write > > import java.util.function.**MutableReducer.OfInt; > in that case, > OfInt function; > is unreadable but fully valid. > So don't do that. Also Eclipse does that the import automagically if you ask it to declare a > local variable using CTRL+SHIFT+1. > Fix Eclipse. --tim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121224/98a570e1/attachment.html From forax at univ-mlv.fr Mon Dec 24 08:20:14 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 24 Dec 2012 17:20:14 +0100 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: References: <50D74310.3090709@oracle.com> <50D8791D.4060907@univ-mlv.fr> Message-ID: <50D880BE.4030905@univ-mlv.fr> On 12/24/2012 05:09 PM, Tim Peierls wrote: > On Mon, Dec 24, 2012 at 10:47 AM, Remi Forax > wrote: > > On 12/23/2012 06:44 PM, Brian Goetz wrote: > > Now on the naming, you forget that you can write > > import java.util.function.MutableReducer.OfInt; > in that case, > OfInt function; > is unreadable but fully valid. > > > So don't do that. Not introducing something that is harmful is a better idea. > > > Also Eclipse does that the import automagically if you ask it to > declare a local variable using CTRL+SHIFT+1. > > > Fix Eclipse. :) I want Eclipse to do that for Map.Entry > > --tim R?mi From brian.goetz at oracle.com Mon Dec 24 08:54:02 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 24 Dec 2012 11:54:02 -0500 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: References: <50D74310.3090709@oracle.com> <50D8791D.4060907@univ-mlv.fr> Message-ID: <50D888AA.5040001@oracle.com> You took the words right out of my mouth! On 12/24/2012 11:09 AM, Tim Peierls wrote: > On Mon, Dec 24, 2012 at 10:47 AM, Remi Forax > wrote: > > On 12/23/2012 06:44 PM, Brian Goetz wrote: > > Now on the naming, you forget that you can write > > import java.util.function.__MutableReducer.OfInt; > in that case, > OfInt function; > is unreadable but fully valid. > > > So don't do that. > > > Also Eclipse does that the import automagically if you ask it to > declare a local variable using CTRL+SHIFT+1. > > > Fix Eclipse. > > --tim From joe.bowbeer at gmail.com Mon Dec 24 09:25:08 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 24 Dec 2012 09:25:08 -0800 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: <50D74310.3090709@oracle.com> References: <50D74310.3090709@oracle.com> Message-ID: Stream.OfInt instead of IntStream? Do PrimitiveStreams methods become (top-level) methods in Streams? One disadvantage of this convention is that these names are hard to implement as extensions by 3rd parties. Suppose I need a ShortStream or a FloatStream. Or an IntFoo for some Foo without a Foo.OfInt, then I can't create Foo.OfInt, and IntFoo doesn't follow the naming convention. On Dec 23, 2012 9:45 AM, "Brian Goetz" wrote: > For types that have primitive specializations that are subtypes of the > base type (e.g., MutableReducer), we've been converging on a naming > convention that puts the subtypes as nested interfaces. For example: > > interface MutableReducer { > > // reducer methods > > interface OfInt extends MutableReducer { ... } > interface OfLong extends MutableReducer { ... } > } > > The motivation here is (a) reduce the javadoc surface area, (b) groups > related abstractions together, (c) makes it clear that these are subsidiary > abstractions, and (d) keep the cut-and-paste stuff together in the code. > The use site also looks pretty reasonable: > > class Foo implements MutableReducer.OfInt { ... } > > This shows up in Sink, IntermediateOp, TerminalOp, MutableReducer, > NodeBuilder, Node, Spliterator, etc. (It also shows up in concrete > implementation classes like ForEachOp, MatchOp, and FindOp, though these > will not be public and so (a) doesn't really apply to these.) > > Are we OK with this convention? It seems to have a tidying effect on the > codebase, the documentation, and client usage, and mitigates the pain of > the primitive specializations. (There will be grey areas where its > applicability is questionable; we can discuss those individually, but there > are a lot of things that it works for.) > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121224/1a600092/attachment.html From brian.goetz at oracle.com Mon Dec 24 09:30:48 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 24 Dec 2012 12:30:48 -0500 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: References: <50D74310.3090709@oracle.com> Message-ID: <50D89148.1000401@oracle.com> > Stream.OfInt instead of IntStream? Where to draw the line is hard. I did think about this but I think this is over the line; they're both big types, IntStream doesn't extend Stream, IntStream is much more than a trivial specialization of Stream. Basically, I think the OfXxx works well when you wish you didn't have to have the types at all, and hiding them as nested types is an uneasy compromise. > Do PrimitiveStreams methods become (top-level) methods in Streams? Not sure what you mean? > One disadvantage of this convention is that these names are hard to > implement as extensions by 3rd parties. Suppose I need a ShortStream or > a FloatStream. Or an IntFoo for some Foo without a Foo.OfInt, then I > can't create Foo.OfInt, and IntFoo doesn't follow the naming convention. Right. Another reason why it fails for "big" abstractions. From joe.bowbeer at gmail.com Mon Dec 24 09:35:46 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 24 Dec 2012 09:35:46 -0800 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: <50D89148.1000401@oracle.com> References: <50D74310.3090709@oracle.com> <50D89148.1000401@oracle.com> Message-ID: I was second-thinking that the answer to my Streams questions is no because these are not subtypes. The PrimitiveStreams question was a follow-on, contingent on Stream.OfInt, etc. The naming question applies to e.g. FloatMapReducer On Dec 24, 2012 9:30 AM, "Brian Goetz" wrote: > Stream.OfInt instead of IntStream? >> > > Where to draw the line is hard. I did think about this but I think this > is over the line; they're both big types, IntStream doesn't extend Stream, > IntStream is much more than a trivial specialization of Stream. > > Basically, I think the OfXxx works well when you wish you didn't have to > have the types at all, and hiding them as nested types is an uneasy > compromise. > > Do PrimitiveStreams methods become (top-level) methods in Streams? >> > > Not sure what you mean? > > One disadvantage of this convention is that these names are hard to >> implement as extensions by 3rd parties. Suppose I need a ShortStream or >> a FloatStream. Or an IntFoo for some Foo without a Foo.OfInt, then I >> can't create Foo.OfInt, and IntFoo doesn't follow the naming convention. >> > > Right. Another reason why it fails for "big" abstractions. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121224/b928a6e4/attachment-0001.html From joe.bowbeer at gmail.com Mon Dec 24 15:25:23 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 24 Dec 2012 15:25:23 -0800 Subject: Into In-Reply-To: <50D7243E.5000803@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> Message-ID: I was going to say something about into() here but the topic has morphed to sequential()? Concerning into(), I was just translating a simple example from Uncle Bob's recent FP resolution(*). The most difficult problem given the current state of jkd8lambda was trying to print a stream... Using StringJoiner seems like the coolest way to do this currently: stream.into(new StringJoiner(", ", "[", "]")) But how's this supposed to work without into()? Btw, the lack of a generic Joiner that accepts any ol' object or primitive is causing me some grief. Given a stream of ints or even Integers, having to manually map(Object::toString) seems like something StringJoiner should be doing automatically. Joe (*) http://blog.8thlight.com/uncle-bob/2012/12/22/FPBE1-Whats-it-all-about.html On Sun, Dec 23, 2012 at 7:33 AM, Brian Goetz wrote: > It is sort of a variant of what I've been suggesting, though not quite > going as far as serialForEach/parallelForEach. In my view the modes that > need to be explicitly chosen between are > > reduce/functional/associative/**order-sensitive > vs > forEach/mutative/commutative/**order-insensitive > > (This distinction only makes a difference in the parallel case, so it is > slightly different from the forEach/forEachParallel distinction, but in a > similar spirit of "less mind reading.") > > > > On 12/23/2012 9:35 AM, Doug Lea wrote: > >> On 12/23/12 07:02, Doug Lea wrote: >> >>> >>> Your asking people to take car about the concurrency in their code >>>> instead of >>>> letting the pipeline taking care of that. >>>> >>> >>> Only for mutative updates, for which they will need to take >>> the same care in any choice of seq vs par for any use of forEach. >>> So there is nothing much special/interesting about this. >>> The main idea is to be uniform about how mutative constructions >>> are less fluent/streamy-looking than functional usages. >>> >>> >> (To continue to re-open old wounds :-) >> For extra Bondage&Discipline/friendly-**guidance, we could always >> re-choose to separately support forEach and parallelForEach methods >> and get rid of implicit moding for forEach. >> Implicit moding can never hurt you in this sense >> for the functional/stateless operations. There are still >> several other stateful ones that would require some similar >> separation though. >> >> Which might be a variant of what Brian was suggesting a few days ago? >> >> >> -Doug >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121224/619295ab/attachment.html From joe.bowbeer at gmail.com Mon Dec 24 17:01:51 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 24 Dec 2012 17:01:51 -0800 Subject: unordered() In-Reply-To: <50D4DF22.4070805@oracle.com> References: <50D4DF22.4070805@oracle.com> Message-ID: Continuing with Uncle Bob's FP snippet: (take 25 (squares-of (integers))) I can code this in jdk8lambda-b69 as: Function square = (Integer i) -> i * i; Stream is = Streams.iterate(0, i -> i + 1).map(square).limit(25); First, *note* that the following simplification won't compile because the RHS creates an IntStream: Stream is = Streams.iterate(0, i -> i + 1).map((Integer i) -> i * i).limit(25); This is a bug, right? Now something about unordered().. When I add parallel() before the map() and print the result, I find that the into() form creates ordered results no matter where I insert unordered() in the pipeline, whereas forEach() already produces unordered results. 1. Always ordered, regardless of unordered() out.print(is.map(Object::toString).into(new StringJoiner(" "))); => 0 1 4 9 ... 441 484 529 576 2. Naturally unordered is.forEach(i -> out.print(i + " ")); => 0 529 576 441 ... 9 16 1 4 Seems weird. -Joe On Fri, Dec 21, 2012 at 2:13 PM, Brian Goetz wrote: > So, the move to a more explicit choice of merging or concurrent tabulation > also reduces (heh) the need for unordered(), though it does not eliminate > it completely. (Limit, cancelation, and duplicate removal all have > optimized versions if encounter order is not significant.) > > Kevin pointed out that .unordered() is pretty easy to miss, and people > will not know that they don't know about it. One possible is to make it > more explicit at one end of the pipeline or the other (the only operation > that is order-injecting is sorted(), and presumably if you are sorting you > really care about encounter order for the downstream ops, otherwise the > sort was a waste of time.) > > The proposed tabulator / reducer stuff makes the order-sensitivity clear > at the tail end, which is a good place to put it -- the user should know > whether a reduce or a forEach is what they want -- if not the user, who? > (Only the user knows whether he cares about order or not, and knows > whether his combination functions are commutative or not.) The other > less-ignorable place to put an ordering opt-out is at the head; we could > make things more clear with adding > > .parallelUnorderedStream() > alongside > .stream() > and > .parallelStream() > > The obvious implementation of parallelUnorderdStream is: > > default Stream parallelStream() { > return stream().unordered(); > } > > which is also the most efficient place to put the .unordered (at the head.) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121224/336c6219/attachment.html From brian.goetz at oracle.com Mon Dec 24 18:29:59 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 24 Dec 2012 21:29:59 -0500 Subject: unordered() In-Reply-To: References: <50D4DF22.4070805@oracle.com> Message-ID: <50D90FA7.7090804@oracle.com> Right. This is caused by the interaction of "IntFunction extends Function" and the overload resolution rules. Fixed by severing the extension relationship between IntFunction and Function. On 12/24/2012 8:01 PM, Joe Bowbeer wrote: > Continuing with Uncle Bob's FP snippet: > > (take 25 (squares-of (integers))) > > I can code this in jdk8lambda-b69 as: > > Function square = (Integer i) -> i * i; > Stream is = Streams.iterate(0, i -> i + > 1).map(square).limit(25); > > > First, *note* that the following simplification won't compile because > the RHS creates an IntStream: > > Stream is = Streams.iterate(0, i -> i + 1).map((Integer i) > -> i * i).limit(25); > > This is a bug, right? > > > Now something about unordered().. > > When I add parallel() before the map() and print the result, I find that > the into() form creates ordered results no matter where I insert > unordered() in the pipeline, whereas forEach() already produces > unordered results. > > 1. Always ordered, regardless of unordered() > > out.print(is.map(Object::toString).into(new StringJoiner(" "))); > > => 0 1 4 9 ... 441 484 529 576 > > 2. Naturally unordered > > is.forEach(i -> out.print(i + " ")); > > => 0 529 576 441 ... 9 16 1 4 > > Seems weird. > > -Joe > > > > On Fri, Dec 21, 2012 at 2:13 PM, Brian Goetz > wrote: > > So, the move to a more explicit choice of merging or concurrent > tabulation also reduces (heh) the need for unordered(), though it > does not eliminate it completely. (Limit, cancelation, and > duplicate removal all have optimized versions if encounter order is > not significant.) > > Kevin pointed out that .unordered() is pretty easy to miss, and > people will not know that they don't know about it. One possible is > to make it more explicit at one end of the pipeline or the other > (the only operation that is order-injecting is sorted(), and > presumably if you are sorting you really care about encounter order > for the downstream ops, otherwise the sort was a waste of time.) > > The proposed tabulator / reducer stuff makes the order-sensitivity > clear at the tail end, which is a good place to put it -- the user > should know whether a reduce or a forEach is what they want -- if > not the user, who? (Only the user knows whether he cares about > order or not, and knows whether his combination functions are > commutative or not.) The other less-ignorable place to put an > ordering opt-out is at the head; we could make things more clear > with adding > > .parallelUnorderedStream() > alongside > .stream() > and > .parallelStream() > > The obvious implementation of parallelUnorderdStream is: > > default Stream parallelStream() { > return stream().unordered(); > } > > which is also the most efficient place to put the .unordered (at the > head.) > > > From joe.bowbeer at gmail.com Mon Dec 24 18:43:13 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 24 Dec 2012 18:43:13 -0800 Subject: unordered() In-Reply-To: <50D90FA7.7090804@oracle.com> References: <50D4DF22.4070805@oracle.com> <50D90FA7.7090804@oracle.com> Message-ID: Btw, this form compiles in jdk8lambda-b69: Stream is = Streams.iterate(0, i -> i + 1).map(i -> (Integer) (i * i)).limit(25); --Joe On Mon, Dec 24, 2012 at 6:29 PM, Brian Goetz wrote: > Right. This is caused by the interaction of "IntFunction extends > Function" and the overload resolution rules. Fixed by severing the > extension relationship between IntFunction and Function. > > > On 12/24/2012 8:01 PM, Joe Bowbeer wrote: > >> Continuing with Uncle Bob's FP snippet: >> >> (take 25 (squares-of (integers))) >> >> I can code this in jdk8lambda-b69 as: >> >> Function square = (Integer i) -> i * i; >> Stream is = Streams.iterate(0, i -> i + >> 1).map(square).limit(25); >> >> >> First, *note* that the following simplification won't compile because >> the RHS creates an IntStream: >> >> Stream is = Streams.iterate(0, i -> i + 1).map((Integer i) >> -> i * i).limit(25); >> >> This is a bug, right? >> >> >> Now something about unordered().. >> >> When I add parallel() before the map() and print the result, I find that >> the into() form creates ordered results no matter where I insert >> unordered() in the pipeline, whereas forEach() already produces >> unordered results. >> >> 1. Always ordered, regardless of unordered() >> >> out.print(is.map(Object::**toString).into(new StringJoiner(" "))); >> >> => 0 1 4 9 ... 441 484 529 576 >> >> 2. Naturally unordered >> >> is.forEach(i -> out.print(i + " ")); >> >> => 0 529 576 441 ... 9 16 1 4 >> >> Seems weird. >> >> -Joe >> >> >> >> On Fri, Dec 21, 2012 at 2:13 PM, Brian Goetz > > wrote: >> >> So, the move to a more explicit choice of merging or concurrent >> tabulation also reduces (heh) the need for unordered(), though it >> does not eliminate it completely. (Limit, cancelation, and >> duplicate removal all have optimized versions if encounter order is >> not significant.) >> >> Kevin pointed out that .unordered() is pretty easy to miss, and >> people will not know that they don't know about it. One possible is >> to make it more explicit at one end of the pipeline or the other >> (the only operation that is order-injecting is sorted(), and >> presumably if you are sorting you really care about encounter order >> for the downstream ops, otherwise the sort was a waste of time.) >> >> The proposed tabulator / reducer stuff makes the order-sensitivity >> clear at the tail end, which is a good place to put it -- the user >> should know whether a reduce or a forEach is what they want -- if >> not the user, who? (Only the user knows whether he cares about >> order or not, and knows whether his combination functions are >> commutative or not.) The other less-ignorable place to put an >> ordering opt-out is at the head; we could make things more clear >> with adding >> >> .parallelUnorderedStream() >> alongside >> .stream() >> and >> .parallelStream() >> >> The obvious implementation of parallelUnorderdStream is: >> >> default Stream parallelStream() { >> return stream().unordered(); >> } >> >> which is also the most efficient place to put the .unordered (at the >> head.) >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121224/6a11024f/attachment-0001.html From forax at univ-mlv.fr Tue Dec 25 15:43:31 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Dec 2012 00:43:31 +0100 Subject: Into In-Reply-To: References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> Message-ID: <50DA3A23.8040101@univ-mlv.fr> On 12/25/2012 12:25 AM, Joe Bowbeer wrote: > I was going to say something about into() here but the topic has > morphed to sequential()? yes, true. I agree with Brian that into is not as great as it can be but I think the problem is the interface Destination that into uses. We have the Spliterator interface that can recursively divide a data structure, into should use the dual concept that is able to gather several parts together to create a new data structure. interface Demultiplexer { // find a better name C create(Chunks chunks); } interface Chunks { int getChunkCountIfKnown(); // number of chunks int getTotalSizeIfKnown(); // total number of elements Stream nextChunk(); // get each chunks, return null if no more chunk } with into declared like this: C into(Demultiplexer demux) and a new method in stream which is able to copy the stream elements into an array at a specific offset. R?mi From Donald.Raab at gs.com Tue Dec 25 19:34:39 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Tue, 25 Dec 2012 22:34:39 -0500 Subject: Proposed rename of Map.forEach(BiBlock block) Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17CEE@GSCMAMP09EX.firmwide.corp.gs.com> Can we rename the forEach(BiBlock block) method to forEachKeyValue(BiBlock block) on Map please? In GS Collections, our MapIterable interace extends RichIterable which extends Iterable. This is a choice which makes it consistent with Smalltalk. This results in us having a method forEach(Procedure) defined on all our Map implementations. This will also cause us to have a method forEach(Block block) defined when JDK 8 is released. Having a third overloaded forEach() method will cause a lot of confusion for us. https://github.com/goldmansachs/gs-collections/blob/master/collections-api/src/main/java/com/gs/collections/api/map/MapIterable.java#L33 Hopefully there are no plans to have Map re-defined as Map extends Iterable> in JDK 8. Otherwise this would result in forEach() having to be redefined as forEach>. For future reference, Trove defines these methods in THashMap: forEachKey(TObjectProcedure procedure) forEachValue(TObjectProcedure procedure) forEachEntry(TObjectObjectProcedure procedure) GS Collections defines these methods in MapIterable: forEach(Procedure procedure) // extended from RichIterable which ultimately extends Iterable forEachKey(Procedure procedure) forEachValue(Procedure procedure) forEachKeyValue(Procedure2 procedure) I would suggest adding the other two methods forEachKey(Block) and forEachValue(Block) to Map for JDK 8 even though this will result in more casting at call sites for users of Trove and GS Collections since our methods will become overloads of the equivalent methods on Map. The current recommended workaround as I understand it will be to have our function types extend Block and BiBlock when JDK 8 is released. From forax at univ-mlv.fr Wed Dec 26 06:08:33 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Dec 2012 15:08:33 +0100 Subject: Proposed rename of Map.forEach(BiBlock block) In-Reply-To: <6712820CB52CFB4D842561213A77C05404C0C17CEE@GSCMAMP09EX.firmwide.corp.gs.com> References: <6712820CB52CFB4D842561213A77C05404C0C17CEE@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50DB04E1.10809@univ-mlv.fr> On 12/26/2012 04:34 AM, Raab, Donald wrote: > Can we rename the forEach(BiBlock block) method to forEachKeyValue(BiBlock block) on Map please? yes, there are 3 different forEach on Iterator, Iterable and Map, an already existing code that mixes two of them, by example a Map that is an iterator too, will not compile with Java 8. R?mi > > In GS Collections, our MapIterable interace extends RichIterable which extends Iterable. This is a choice which makes it consistent with Smalltalk. This results in us having a method forEach(Procedure) defined on all our Map implementations. This will also cause us to have a method forEach(Block block) defined when JDK 8 is released. Having a third overloaded forEach() method will cause a lot of confusion for us. > > https://github.com/goldmansachs/gs-collections/blob/master/collections-api/src/main/java/com/gs/collections/api/map/MapIterable.java#L33 > > Hopefully there are no plans to have Map re-defined as Map extends Iterable> in JDK 8. Otherwise this would result in forEach() having to be redefined as forEach>. > > For future reference, Trove defines these methods in THashMap: > > forEachKey(TObjectProcedure procedure) > forEachValue(TObjectProcedure procedure) > forEachEntry(TObjectObjectProcedure procedure) > > GS Collections defines these methods in MapIterable: > > forEach(Procedure procedure) // extended from RichIterable which ultimately extends Iterable > forEachKey(Procedure procedure) > forEachValue(Procedure procedure) > forEachKeyValue(Procedure2 procedure) > > I would suggest adding the other two methods forEachKey(Block) and forEachValue(Block) to Map for JDK 8 even though this will result in more casting at call sites for users of Trove and GS Collections since our methods will become overloads of the equivalent methods on Map. The current recommended workaround as I understand it will be to have our function types extend Block and BiBlock when JDK 8 is released. > > > > From dl at cs.oswego.edu Wed Dec 26 06:16:00 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 26 Dec 2012 09:16:00 -0500 Subject: Proposed rename of Map.forEach(BiBlock block) In-Reply-To: <50DB04E1.10809@univ-mlv.fr> References: <6712820CB52CFB4D842561213A77C05404C0C17CEE@GSCMAMP09EX.firmwide.corp.gs.com> <50DB04E1.10809@univ-mlv.fr> Message-ID: <50DB06A0.60803@cs.oswego.edu> On 12/26/12 09:08, Remi Forax wrote: > On 12/26/2012 04:34 AM, Raab, Donald wrote: >> Can we rename the forEach(BiBlock block) method to >> forEachKeyValue(BiBlock block) on Map please? > > yes, > there are 3 different forEach on Iterator, Iterable and Map, an already existing > code that mixes two of them, by example a Map that is an iterator too, > will not compile with Java 8. For this and other reasons, we decided when adding other default methods to map last week or so, to omit this method entirely, so it isn't in the current lambda repo. -Doug > > R?mi > >> >> In GS Collections, our MapIterable interace extends RichIterable >> which extends Iterable. This is a choice which makes it consistent with >> Smalltalk. This results in us having a method forEach(Procedure) >> defined on all our Map implementations. This will also cause us to have a >> method forEach(Block block) defined when JDK 8 is released. Having >> a third overloaded forEach() method will cause a lot of confusion for us. >> >> https://github.com/goldmansachs/gs-collections/blob/master/collections-api/src/main/java/com/gs/collections/api/map/MapIterable.java#L33 >> >> >> Hopefully there are no plans to have Map re-defined as Map extends >> Iterable> in JDK 8. Otherwise this would result in forEach() >> having to be redefined as forEach>. >> >> For future reference, Trove defines these methods in THashMap: >> >> forEachKey(TObjectProcedure procedure) >> forEachValue(TObjectProcedure procedure) >> forEachEntry(TObjectObjectProcedure procedure) >> >> GS Collections defines these methods in MapIterable: >> >> forEach(Procedure procedure) // extended from RichIterable which >> ultimately extends Iterable >> forEachKey(Procedure procedure) >> forEachValue(Procedure procedure) >> forEachKeyValue(Procedure2 procedure) >> >> I would suggest adding the other two methods forEachKey(Block) and >> forEachValue(Block) to Map for JDK 8 even though this will result >> in more casting at call sites for users of Trove and GS Collections since our >> methods will become overloads of the equivalent methods on Map. The current >> recommended workaround as I understand it will be to have our function types >> extend Block and BiBlock when JDK 8 is released. >> >> >> > From dl at cs.oswego.edu Wed Dec 26 07:08:17 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 26 Dec 2012 10:08:17 -0500 Subject: Into In-Reply-To: <50DA3A23.8040101@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> Message-ID: <50DB12E1.8050408@cs.oswego.edu> On 12/25/12 18:43, Remi Forax wrote: > On 12/25/2012 12:25 AM, Joe Bowbeer wrote: >> I was going to say something about into() here but the topic has morphed to >> sequential()? > > yes, true. > > I agree with Brian that into is not as great as it can be but I think the > problem is the interface Destination that into uses. Maybe we are focusing on different problems, but to me the main one is a spec/expectations clash: For user-friendliness, we want relevant properties of sources to be preserved in destinations. But for generality, we want anything to be put into anything. This shows up mainly in orderedness, but you can imagine users "expecting" any other property as well (like sortedness wrt a comparator). I think this is a no-win situation. -Doug From forax at univ-mlv.fr Wed Dec 26 07:23:10 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Dec 2012 16:23:10 +0100 Subject: Into In-Reply-To: <50DB12E1.8050408@cs.oswego.edu> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> Message-ID: <50DB165E.1040200@univ-mlv.fr> On 12/26/2012 04:08 PM, Doug Lea wrote: > On 12/25/12 18:43, Remi Forax wrote: >> On 12/25/2012 12:25 AM, Joe Bowbeer wrote: >>> I was going to say something about into() here but the topic has >>> morphed to >>> sequential()? >> >> yes, true. >> >> I agree with Brian that into is not as great as it can be but I think >> the >> problem is the interface Destination that into uses. > > Maybe we are focusing on different problems, but to me the main > one is a spec/expectations clash: For user-friendliness, we want > relevant properties of sources to be preserved in destinations. > But for generality, we want anything to be put into anything. > This shows up mainly in orderedness, but you can imagine users > "expecting" any other property as well (like sortedness wrt a > comparator). I think this is a no-win situation. that's why we need two different stream ops, toList/toSet should conserve the property of the source i.e. create the 'right' Set or List implementation depending on the source property and into that uses the destination property. The second problem is what is the interface of a stream which is split to be computed in parallel in order to be gathered without using an intermediary data structure as now. For toList/toSet, because the pipeline implementation control the Set/List implementation, so there is no need of such interface, for into(), the question is is with interface pull it's own weight or not ? > > -Doug > R?mi From dl at cs.oswego.edu Wed Dec 26 07:27:18 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 26 Dec 2012 10:27:18 -0500 Subject: Proposed rename of Map.forEach(BiBlock block) In-Reply-To: <6712820CB52CFB4D842561213A77C05404C0C17CEE@GSCMAMP09EX.firmwide.corp.gs.com> References: <6712820CB52CFB4D842561213A77C05404C0C17CEE@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50DB1756.5010408@cs.oswego.edu> On 12/25/12 22:34, Raab, Donald wrote: > Hopefully there are no plans to have Map re-defined as Map extends Iterable> in JDK 8. Otherwise this would result in forEach() having to be redefined as forEach>. > The main snag is that there is no par/seq framework for Maps per se, so there is no way to ask for seq vs par forEach(BiBlock). Creating one is harder than for collections/streams, so is triaged out for now. But people can get all the streams stuff on any map.entrySet, at some expense in ugliness (and overhead), which is better than putting something terrible into place now. In the mean time, CHM exposes versions with names that cannot possibly clash with any plausible seq/par Map framewwork. Better ideas are welcome. -Doug From dl at cs.oswego.edu Wed Dec 26 07:40:39 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 26 Dec 2012 10:40:39 -0500 Subject: Into In-Reply-To: <50DB165E.1040200@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> Message-ID: <50DB1A77.2080305@cs.oswego.edu> On 12/26/12 10:23, Remi Forax wrote: > that's why we need two different stream ops, > toList/toSet should conserve the property of the source i.e. create the 'right' > Set or List implementation depending on the source property and into that uses > the destination property. > > The second problem is what is the interface of a stream which is split to be > computed in parallel in order to be gathered without using an intermediary data > structure as now. For toList/toSet, because the pipeline implementation control > the Set/List implementation, so there is no need of such interface, for into(), > the question is is with interface pull it's own weight or not ? > Right. My line of thought was: We'll need/want the to-* versions anyway. Given this, do the Reducers/Tabulators pull their weight? People can always define such things themselves layered on top of stream API. While I'm at it, here's a fleshed out version of one possible to-* API. (Note: under this scheme methods sorted() and unique() go away). Object[] toArray(); Set toSet(); List toList(); List toRandomAccessList(); List toSortedList(Comparator comparator); List toSortedList(); NavigableSet toSortedSet(); NavigableSet toSortedSet(Comparator comparator); Collection toBag(); // unordered, possible dups Map toMap(Function keyFn, BinaryOperator mergeFn); Map> toMap(Function keyFn); NavigableMap toSortedMap(Function keyFn, Comparator comparator, BinaryOperator mergeFn); NavigableMap> toSortedMap(Function keyFn, Comparator comparator); From Donald.Raab at gs.com Wed Dec 26 07:46:43 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Wed, 26 Dec 2012 10:46:43 -0500 Subject: Into In-Reply-To: <50DB1A77.2080305@cs.oswego.edu> References: <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17D1A@GSCMAMP09EX.firmwide.corp.gs.com> We have methods to toSortedSet() and toSortedMap() that return SortedSet and SortedMap (as the names imply). We remain compatible back to Java 5, where NavigableSet and NavigableMap do not exist. I would request either calling the methods toNavigableSet()/toNavigableMap() and returning NavigableSet/NavigableMap or having both toSorted and toNavigable forms and returning the appropriately named types. > > Object[] toArray(); > Set toSet(); > List toList(); > List toRandomAccessList(); > List toSortedList(Comparator comparator); > List toSortedList(); > NavigableSet toSortedSet(); > NavigableSet toSortedSet(Comparator comparator); > Collection toBag(); // unordered, possible dups > Map toMap(Function keyFn, BinaryOperator > mergeFn); > Map> toMap(Function keyFn); > NavigableMap toSortedMap(Function keyFn, > Comparator comparator, > BinaryOperator mergeFn); > NavigableMap> toSortedMap(Function > keyFn, > Comparator > comparator); From Donald.Raab at gs.com Wed Dec 26 07:51:33 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Wed, 26 Dec 2012 10:51:33 -0500 Subject: Into In-Reply-To: <6712820CB52CFB4D842561213A77C05404C0BFFF85@GSCMAMP09EX.firmwide.corp.gs.com> References: <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C0BFFF85@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17D1C@GSCMAMP09EX.firmwide.corp.gs.com> Just to clarify, if these methods are added to Stream, we won't have any conflicts, as our methods appear on our Collections. But I think it might still be good to clarify the names. > -----Original Message----- > From: Raab, Donald [Tech] > Sent: Wednesday, December 26, 2012 10:47 AM > To: 'Doug Lea'; 'lambda-libs-spec-experts at openjdk.java.net' > Subject: RE: Into > > We have methods to toSortedSet() and toSortedMap() that return SortedSet > and SortedMap (as the names imply). We remain compatible back to Java > 5, where NavigableSet and NavigableMap do not exist. > > I would request either calling the methods > toNavigableSet()/toNavigableMap() and returning > NavigableSet/NavigableMap or having both toSorted and toNavigable forms > and returning the appropriately named types. > > > > > > Object[] toArray(); > > Set toSet(); > > List toList(); > > List toRandomAccessList(); > > List toSortedList(Comparator comparator); > > List toSortedList(); > > NavigableSet toSortedSet(); > > NavigableSet toSortedSet(Comparator comparator); > > Collection toBag(); // unordered, possible dups > > Map toMap(Function keyFn, BinaryOperator > > mergeFn); > > Map> toMap(Function keyFn); > > NavigableMap toSortedMap(Function keyFn, > > Comparator > comparator, > > BinaryOperator mergeFn); > > NavigableMap> toSortedMap(Function > T,K> keyFn, > > Comparator > K> comparator); From dl at cs.oswego.edu Wed Dec 26 08:01:15 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 26 Dec 2012 11:01:15 -0500 Subject: Into In-Reply-To: <6712820CB52CFB4D842561213A77C05404C0C17D1C@GSCMAMP09EX.firmwide.corp.gs.com> References: <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C0BFFF85@GSCMAMP09EX.firmwide.corp.gs.com> <6712820CB52CFB4D842561213A77C05404C0C17D1C@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50DB1F4B.7030001@cs.oswego.edu> On 12/26/12 10:51, Raab, Donald wrote: > Just to clarify, if these methods are added to Stream, we won't have any conflicts, as our methods appear on our Collections. But I think it might still be good to clarify the names. > Sure. In the unlikely event that Brian agrees about going this route :-), the names are up for grabs. I listed them this way because we find that people don't like the name "Navigable" (because it does not immediately imply sortedness in people's minds) so the less often you are forced to use it the better. But that's a minor concern. (Note that several method names don't exactly match up with return type names. There's no j.u. name for List & RandomAccess, or for a Collection that is not necessarily a List but unlike Set permits duplicates. (Further aside: If there were, then Map.values() should use it as well, but too late for that.)) -Doug >> -----Original Message----- >> From: Raab, Donald [Tech] >> Sent: Wednesday, December 26, 2012 10:47 AM >> To: 'Doug Lea'; 'lambda-libs-spec-experts at openjdk.java.net' >> Subject: RE: Into >> >> We have methods to toSortedSet() and toSortedMap() that return SortedSet >> and SortedMap (as the names imply). We remain compatible back to Java >> 5, where NavigableSet and NavigableMap do not exist. >> >> I would request either calling the methods >> toNavigableSet()/toNavigableMap() and returning >> NavigableSet/NavigableMap or having both toSorted and toNavigable forms >> and returning the appropriately named types. >> >> >>> >>> Object[] toArray(); >>> Set toSet(); >>> List toList(); >>> List toRandomAccessList(); >>> List toSortedList(Comparator comparator); >>> List toSortedList(); >>> NavigableSet toSortedSet(); >>> NavigableSet toSortedSet(Comparator comparator); >>> Collection toBag(); // unordered, possible dups >>> Map toMap(Function keyFn, BinaryOperator >>> mergeFn); >>> Map> toMap(Function keyFn); >>> NavigableMap toSortedMap(Function keyFn, >>> Comparator >> comparator, >>> BinaryOperator mergeFn); >>> NavigableMap> toSortedMap(Function>> T,K> keyFn, >>> Comparator>> K> comparator); > > From Donald.Raab at gs.com Wed Dec 26 08:09:43 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Wed, 26 Dec 2012 11:09:43 -0500 Subject: Into In-Reply-To: <50DB1F4B.7030001@cs.oswego.edu> References: <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C0BFFF85@GSCMAMP09EX.firmwide.corp.gs.com> <6712820CB52CFB4D842561213A77C05404C0C17D1C@GSCMAMP09EX.firmwide.corp.gs.com> <50DB1F4B.7030001@cs.oswego.edu> Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17D1F@GSCMAMP09EX.firmwide.corp.gs.com> Couldn't we push the Navigable methods up to Sorted using default methods? > Sure. In the unlikely event that Brian agrees about going this route :- > ), the names are up for grabs. I listed them this way because we find > that people don't like the name "Navigable" > (because it does not immediately imply sortedness in people's minds) so > the less often you are forced to use it the better. > But that's a minor concern. > That's why we have a Bag, which is what toBag() would return for us. We're fine as long as it returns Collection, as a MutableBag is a Collection. > (Note that several method names don't exactly match up with return type > names. There's no j.u. name for List & RandomAccess, or for a Collection > that is not necessarily a List but unlike Set permits duplicates. > (Further aside: If there were, then Map.values() should use it as well, > but too late for that.)) > > -Doug > > From Donald.Raab at gs.com Wed Dec 26 08:12:17 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Wed, 26 Dec 2012 11:12:17 -0500 Subject: Into In-Reply-To: <50DB1A77.2080305@cs.oswego.edu> References: <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17D20@GSCMAMP09EX.firmwide.corp.gs.com> > Map> toMap(Function keyFn); This is essentially groupBy. From dl at cs.oswego.edu Wed Dec 26 08:21:03 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 26 Dec 2012 11:21:03 -0500 Subject: Into In-Reply-To: <6712820CB52CFB4D842561213A77C05404C0C17D20@GSCMAMP09EX.firmwide.corp.gs.com> References: <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C0C17D20@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50DB23EF.3070800@cs.oswego.edu> On 12/26/12 11:12, Raab, Donald wrote: >> Map> toMap(Function keyFn); > > This is essentially groupBy. > Yeah, and worse, it is the sucky version of groupBy that doesn't force you to do the right thing and merge while building! (The overloaded two-arg form does that.) I'm not even sure it is a service to provide it, although it is essential here and there and a little too painful for users to do themselves without the method. In any case, I think having a highly uniform naming scheme outweighs most of these concerns. -Doug From forax at univ-mlv.fr Wed Dec 26 08:52:24 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Dec 2012 17:52:24 +0100 Subject: Into In-Reply-To: <50DB1A77.2080305@cs.oswego.edu> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> Message-ID: <50DB2B48.2040109@univ-mlv.fr> On 12/26/2012 04:40 PM, Doug Lea wrote: > On 12/26/12 10:23, Remi Forax wrote: >> that's why we need two different stream ops, >> toList/toSet should conserve the property of the source i.e. create >> the 'right' >> Set or List implementation depending on the source property and into >> that uses >> the destination property. >> >> The second problem is what is the interface of a stream which is >> split to be >> computed in parallel in order to be gathered without using an >> intermediary data >> structure as now. For toList/toSet, because the pipeline >> implementation control >> the Set/List implementation, so there is no need of such interface, >> for into(), >> the question is is with interface pull it's own weight or not ? >> > > Right. My line of thought was: We'll need/want the to-* versions > anyway. Given this, do the Reducers/Tabulators pull their weight? > People can always define such things themselves layered on top > of stream API. > > While I'm at it, here's a fleshed out version of one possible > to-* API. (Note: under this scheme methods sorted() and unique() go > away). No, I think it's better to have only toList() and toSet(), the result of stream.sorted().toSet() will return a NavigableSet/SortedSet. The idea is that the method to* will choose the best implementation using the property of the pipeline. If you want a specific implementation, then use into(). Maybe, for toSet, we have two methods (toSet and toNavigableSet) but in that case, if the element of the pipeline are not sorted (by using sorted() or because the source collection is a navigable set and ops specified doesn't modify the order) then it should throw an exception. Given that, here is my wish list, U[] toArray(Class clazz); // may throw an ArrayStoreException Set toSet() // may throw an IllegalStateException if elements are not unique NavigableSet toSortedSet() // may throw an IllegalStateException if elements are not sorted and unique List toList(); // always returns a random access list, if you want a sequential list use into(new LinkedList<>()). List toSortedList(); // always returns a random access list, may throw an IllegalStateException if elements are not sorted and all of them are implemented in the Stream interface as default methods checking the pipeline flags and delegating to into(). > > > Object[] toArray(); > Set toSet(); > List toList(); > List toRandomAccessList(); > List toSortedList(Comparator comparator); > List toSortedList(); > NavigableSet toSortedSet(); > NavigableSet toSortedSet(Comparator comparator); > Collection toBag(); // unordered, possible dups > Map toMap(Function keyFn, BinaryOperator > mergeFn); > Map> toMap(Function keyFn); > NavigableMap toSortedMap(Function keyFn, > Comparator comparator, > BinaryOperator mergeFn); > NavigableMap> toSortedMap(Function T,K> keyFn, > Comparator K> comparator); > R?mi From dl at cs.oswego.edu Wed Dec 26 09:07:02 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 26 Dec 2012 12:07:02 -0500 Subject: Into In-Reply-To: <50DB2B48.2040109@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> Message-ID: <50DB2EB6.30700@cs.oswego.edu> On 12/26/12 11:52, Remi Forax wrote: > No, I think it's better to have only toList() and toSet(), > the result of stream.sorted().toSet() will return a NavigableSet/SortedSet. > The idea is that the method to* will choose the best implementation using the > property of the pipeline. > > If you want a specific implementation, then use into(). Sorry, I still don't buy it. If you want a specific implementation, then my sense is that you will end up writing something like the following anyway: Stream s = ...; SomeCollection dest = ... // add s to dest via (par/seq) forEach or loop or whatever so why bother adding all the support code that people will probably not use anyway in custom situations because, well, they are custom situations. So to me, Reducers etc are in the maybe-nice-to-have category. -Doug From joe.bowbeer at gmail.com Wed Dec 26 10:00:07 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Wed, 26 Dec 2012 10:00:07 -0800 Subject: Into In-Reply-To: <50DB165E.1040200@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> Message-ID: I agree that we still need an into() for things like Joiner that do not want to implement an entire Collection interface, just the sink/aggregator part. On Dec 26, 2012 7:25 AM, "Remi Forax" wrote: > On 12/26/2012 04:08 PM, Doug Lea wrote: > >> On 12/25/12 18:43, Remi Forax wrote: >> >>> On 12/25/2012 12:25 AM, Joe Bowbeer wrote: >>> >>>> I was going to say something about into() here but the topic has >>>> morphed to >>>> sequential()? >>>> >>> >>> yes, true. >>> >>> I agree with Brian that into is not as great as it can be but I think the >>> problem is the interface Destination that into uses. >>> >> >> Maybe we are focusing on different problems, but to me the main >> one is a spec/expectations clash: For user-friendliness, we want >> relevant properties of sources to be preserved in destinations. >> But for generality, we want anything to be put into anything. >> This shows up mainly in orderedness, but you can imagine users >> "expecting" any other property as well (like sortedness wrt a >> comparator). I think this is a no-win situation. >> > > that's why we need two different stream ops, > toList/toSet should conserve the property of the source i.e. create the > 'right' Set or List implementation depending on the source property and > into that uses the destination property. > > The second problem is what is the interface of a stream which is split to > be computed in parallel in order to be gathered without using an > intermediary data structure as now. For toList/toSet, because the pipeline > implementation control the Set/List implementation, so there is no need of > such interface, for into(), the question is is with interface pull it's own > weight or not ? > > >> -Doug >> >> > R?mi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121226/aa6361cb/attachment.html From brian.goetz at oracle.com Wed Dec 26 10:00:10 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Dec 2012 13:00:10 -0500 Subject: Into In-Reply-To: References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> Message-ID: <50DB3B2A.2000909@oracle.com> > I was just translating a simple example from Uncle Bob's recent FP > resolution(*). The most difficult problem given the current state of > jkd8lambda was trying to print a stream... > > Using StringJoiner seems like the coolest way to do this currently: > > stream.into(new StringJoiner(", ", "[", "]")) Right, that's the current plan. StringJoiner implements Stream.Destination. > But how's this supposed to work without into()? Well, that's one of the things to figure out. I think it fits pretty nicely as a mutable reduce. make-accumulator() -> new StringJoiner() accumulate(a, e) -> a.add(e) combine(a1, a2) -> if (!a2.isEmpty) a1.addAll(a2) This parallelizes well. The only trickiness is adding the pre/post strings, which have to be deferred to the very end. All doable. > Btw, the lack of a generic Joiner that accepts any ol' object or > primitive is causing me some grief. Given a stream of ints or even > Integers, having to manually map(Object::toString) seems like something > StringJoiner should be doing automatically. Yeah, it should accept Object and call toString on it. String.toString() is cheap. From forax at univ-mlv.fr Wed Dec 26 10:02:11 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Dec 2012 19:02:11 +0100 Subject: Into In-Reply-To: <50DB2EB6.30700@cs.oswego.edu> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> <50DB2EB6.30700@cs.oswego.edu> Message-ID: <50DB3BA3.3000501@univ-mlv.fr> On 12/26/2012 06:07 PM, Doug Lea wrote: > On 12/26/12 11:52, Remi Forax wrote: > >> No, I think it's better to have only toList() and toSet(), >> the result of stream.sorted().toSet() will return a >> NavigableSet/SortedSet. >> The idea is that the method to* will choose the best implementation >> using the >> property of the pipeline. >> >> If you want a specific implementation, then use into(). > > Sorry, I still don't buy it. If you want a specific implementation, > then my sense is that you will end up writing something like > the following anyway: > > Stream s = ...; > SomeCollection dest = ... > // add s to dest via (par/seq) forEach or loop or whatever again, letting people to do the copy will create a lot of non thread safe codes. I see forEach is a necessary evil, not as something that people should use every days. > > so why bother adding all the support code that people will probably > not use anyway in custom situations because, well, they are custom > situations. So to me, Reducers etc are in the maybe-nice-to-have > category. while I agree that custom reducers have to fly by themselves, we need to provide an operation that pull all elements from a parallel stream and put them in any collections in a thread safe manner that doesn't require 10 eyeballs to look at the code. > > -Doug > > R?mi From Donald.Raab at gs.com Wed Dec 26 10:38:58 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Wed, 26 Dec 2012 13:38:58 -0500 Subject: A couple of tabulate/Tabulators.groupBy examples Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17D39@GSCMAMP09EX.firmwide.corp.gs.com> I updated our kata with the latest changes in the Dec. 17 binaries. Since groupBy was removed in this release I had to work through Brian's new tabulate/Tabulator approach in a couple places. Except for some type inference problems where I had to specify some extra types in various places it was not too hard to get to work. Not sure if there is an easier solution for the types here. A simple example. Before: Map> multimap = this.company.getCustomers().stream().groupBy(Customer::getCity); After: Map> multimap = this.company.getCustomers() .stream() .tabulate(Tabulators.groupBy(Customer::getCity)); A more complex example. Before: Map> multimap = this.company.getCustomers() .stream() .groupBy(customer -> customer.getOrders() .stream() .flatMap((Block sink, Order element) -> {element.getLineItems().forEach(sink);}) .map(LineItem::getValue) .reduce(0.0, (x, y) -> Math.max(x,y))); After: Map> multimap = this.company.getCustomers() .stream() .tabulate(Tabulators.groupBy((Customer customer) -> customer.getOrders() .stream() .flatMap((Block sink, Order element) -> { element.getLineItems().forEach(sink); }) .map(lineItem -> lineItem.getValue()) .reduce(0.0, (x, y) -> Math.max(x, y)))); From brian.goetz at oracle.com Wed Dec 26 10:38:54 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Dec 2012 13:38:54 -0500 Subject: Into In-Reply-To: <50DB3BA3.3000501@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> <50DB2EB6.30700@cs.oswego.edu> <50DB3BA3.3000501@univ-mlv.fr> Message-ID: <50DB443E.5010800@oracle.com> Let's try to separate some things here. There's lots of defending of into() because it is (a) useful and (b) safe. That's all good. But let's see if we can think of these more as functional requirements than as mandating a specific API (whether one that happens to be already implemented, like into(), or the newly proposed ones like toEveryKindOfCollection().) Into as currently implemented has many negatives, including: - Adds conceptual and API surface area -- destinations have to implement Destination, the semantics of into are weird and unique to into - Will likely parallelize terribly - Doesn't provide the user enough control over how the into'ing is done (seq vs par, order-sensitive vs not) So let's step back and talk requirements. I think the only clear functional requirement is that it should be easy to accumulate the result of a stream into a collection or similar container. It should be easy to customize what kind of collection, but also easy to say "give me a reasonable default." Additionally, the performance characteristics should be transparent; users should be able to figure out what's going to happen. There are lots of other nice-to-haves, such as: - Minimize impact on Collection implementations - Minimize magic/guessing about the user's intent - Support destinations that aren't collections - Minimize tight coupling of Stream API to existing Collection APIs The current into() fails on nearly all of these. At the risk of being a broken record, there are really two cases here: - Reduce-like. Aggregate values into groups at the leaves of the tree, and then combine the groups somehow. This preserves encounter order, but has merging overhead. Merging overhead ranges from small (build a conc-tree) to large (add the elements of the right subresult individually to the left subresult) depending on the chosen data structure. - Foreach-like. Have each leaf shovel its values into a single shared concurrent container (imagine a ConcurrentVector class.) This ignores encounter order, but a well-written concurrent destination might be able to outperform the merging behavior. In earlier implementations we tried to guess between the two modes based on the ordering charcteristics of the source and the order-preserving characteristics of the intermediate ops. This is both risky and harder for the user to control (hence hacks like .unordered()). I think this general approach is a loser for all but the most special cases. Since we can't read the user's mind about whether they care about encounter order or not (e.g., they may use a List because there's no Multiset implementation handy), I think we need to provide ways of aggregating that let the user explicitly choose between order-preserving aggregation and concurrent aggregation. I think having the word "concurrent" in the code somewhere isn't a bad clue. On 12/26/2012 1:02 PM, Remi Forax wrote: > On 12/26/2012 06:07 PM, Doug Lea wrote: >> On 12/26/12 11:52, Remi Forax wrote: >> >>> No, I think it's better to have only toList() and toSet(), >>> the result of stream.sorted().toSet() will return a >>> NavigableSet/SortedSet. >>> The idea is that the method to* will choose the best implementation >>> using the >>> property of the pipeline. >>> >>> If you want a specific implementation, then use into(). >> >> Sorry, I still don't buy it. If you want a specific implementation, >> then my sense is that you will end up writing something like >> the following anyway: >> >> Stream s = ...; >> SomeCollection dest = ... >> // add s to dest via (par/seq) forEach or loop or whatever > > again, letting people to do the copy will create a lot of non thread > safe codes. > I see forEach is a necessary evil, not as something that people should > use every days. > >> >> so why bother adding all the support code that people will probably >> not use anyway in custom situations because, well, they are custom >> situations. So to me, Reducers etc are in the maybe-nice-to-have >> category. > > while I agree that custom reducers have to fly by themselves, > we need to provide an operation that pull all elements from a parallel > stream and put them in any collections in a thread safe manner that > doesn't require 10 eyeballs to look at the code. > >> >> -Doug >> >> > > R?mi > From brian.goetz at oracle.com Wed Dec 26 10:49:26 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Dec 2012 13:49:26 -0500 Subject: A couple of tabulate/Tabulators.groupBy examples In-Reply-To: <6712820CB52CFB4D842561213A77C05404C0C17D39@GSCMAMP09EX.firmwide.corp.gs.com> References: <6712820CB52CFB4D842561213A77C05404C0C17D39@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50DB46B6.5020705@oracle.com> > A simple example. > > Before: > > Map> multimap = this.company.getCustomers().stream().groupBy(Customer::getCity); > > After: > > Map> multimap = > this.company.getCustomers() > .stream() > .tabulate(Tabulators.groupBy(Customer::getCity)); I would think the explicit arguments in this case could be avoided? Usually we've only had problems with inference when there are constructor refs (ArrayList::new). In any case, there's work going on to improve inference here. With properly working inference and static import we'd get: > this.company.getCustomers() > .stream() > .tabulate(groupBy(Customer::getCity)); Where the Tabulators stuff really shines is the ability to compose these things. The old groupBy/reduceBy did a decent job at what it did well, but fell off a cliff when you asked it to do much more. > A more complex example. This is kind of a weird example, since the synthetic key is a number. Another way to accomplish this, that might be more in line with real business usage, is by the equivalent of the old "mapped": Map largestLineItemByByCustomer = customers.stream().tabulate(mappedTo(c -> c.getOrders()...reduce())); This creates a map from Customer to your max order value reduction. If you then want a "top ten customers by largest transaction", you could do: largestLineItemByCustomer.entrySet() .stream() .sorted(Comparators.naturalOrderValues()) .limit(10) .map(Map.Entry::getKey) .forEach(...); > > Before: > > Map> multimap = this.company.getCustomers() > .stream() > .groupBy(customer -> > customer.getOrders() > .stream() > .flatMap((Block sink, Order element) -> {element.getLineItems().forEach(sink);}) > .map(LineItem::getValue) > .reduce(0.0, (x, y) -> Math.max(x,y))); From forax at univ-mlv.fr Wed Dec 26 11:05:19 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Dec 2012 20:05:19 +0100 Subject: Into In-Reply-To: <50DB443E.5010800@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> <50DB2EB6.30700@cs.oswego.edu> <50DB3BA3.3000501@univ-mlv.fr> <50DB443E.5010800@oracle.com> Message-ID: <50DB4A6F.8060202@univ-mlv.fr> On 12/26/2012 07:38 PM, Brian Goetz wrote: > > Since we can't read the user's mind about whether they care about > encounter order or not (e.g., they may use a List because there's no > Multiset implementation handy), I think we need to provide ways of > aggregating that let the user explicitly choose between > order-preserving aggregation and concurrent aggregation. I think > having the word "concurrent" in the code somewhere isn't a bad clue. This defeat one important purpose of the Stream API which is to be parallel/sequential agnostic from the user POV. And again, people will use a parallel streams without the concurrent aggregator. What about this use case ? List extractList(Stream stream) { return stream. ... ops ... // <-- please complete } ... main(...) { Collection c = ... extractList(c.stream()); extractList(c.parallelStream()); } Here having a toList() that take care about concurrency if needed is very appealing. R?mi From brian.goetz at oracle.com Wed Dec 26 11:11:12 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Dec 2012 14:11:12 -0500 Subject: Into In-Reply-To: <50DB4A6F.8060202@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> <50DB2EB6.30700@cs.oswego.edu> <50DB3BA3.3000501@univ-mlv.fr> <50DB443E.5010800@oracle.com> <50DB4A6F.8060202@univ-mlv.fr> Message-ID: <50DB4BD0.1060800@oracle.com> >> Since we can't read the user's mind about whether they care about >> encounter order or not (e.g., they may use a List because there's no >> Multiset implementation handy), I think we need to provide ways of >> aggregating that let the user explicitly choose between >> order-preserving aggregation and concurrent aggregation. I think >> having the word "concurrent" in the code somewhere isn't a bad clue. > > This defeat one important purpose of the Stream API which is to be > parallel/sequential agnostic from the user POV. Only to the extent that reality forces us to. The user has to declare whether they care about encounter order vs arrival order, or (equivalently) whether their reducers are associative or commutative. The user *does* have to understand this, otherwise we lose many of the benefits of parallelism by being forced to make bad assumptions. > Here having a toList() that take care about concurrency if needed is very appealing. Only if the cost of this is not that performance sucks in surprising ways. The performance of into() sucks in surprising ways. From Donald.Raab at gs.com Wed Dec 26 11:14:02 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Wed, 26 Dec 2012 14:14:02 -0500 Subject: A couple of tabulate/Tabulators.groupBy examples In-Reply-To: <50DB46B6.5020705@oracle.com> References: <6712820CB52CFB4D842561213A77C05404C0C17D39@GSCMAMP09EX.firmwide.corp.gs.com> <50DB46B6.5020705@oracle.com> Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17D3E@GSCMAMP09EX.firmwide.corp.gs.com> It is a complex contrived example made that way to get students of our kata class thinking and combining multiple steps. The solution with our API today is not that bad IMO. Simple: MutableListMultimap multimap = this.company.getCustomers().groupBy(Customer::getCity); Complex: MutableListMultimap multimap = this.company.getCustomers() .groupBy(customer -> customer.getOrders() .asLazy() .flatCollect(Order::getLineItems) .collectDouble(LineItem::getValue) .max()); It actually got better since groupBy was removed from Iterable/Collection, because we'll no longer have to cast the lambda to our Function. > > A more complex example. > > This is kind of a weird example, since the synthetic key is a number. > > Another way to accomplish this, that might be more in line with real > business usage, is by the equivalent of the old "mapped": > > Map largestLineItemByByCustomer = > customers.stream().tabulate(mappedTo(c -> c.getOrders()...reduce())); > > This creates a map from Customer to your max order value reduction. If you > then want a "top ten customers by largest transaction", you could do: > > largestLineItemByCustomer.entrySet() > .stream() > .sorted(Comparators.naturalOrderValues()) > .limit(10) > .map(Map.Entry::getKey) > .forEach(...); > > > > > > > Before: > > > > Map> multimap = > this.company.getCustomers() > > .stream() > > .groupBy(customer -> > > customer.getOrders() > > .stream() > > .flatMap((Block sink, Order element) > -> {element.getLineItems().forEach(sink);}) > > .map(LineItem::getValue) > > .reduce(0.0, (x, y) -> Math.max(x,y))); From forax at univ-mlv.fr Wed Dec 26 11:52:21 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Dec 2012 20:52:21 +0100 Subject: Into In-Reply-To: <50DB443E.5010800@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> <50DB2EB6.30700@cs.oswego.edu> <50DB3BA3.3000501@univ-mlv.fr> <50DB443E.5010800@oracle.com> Message-ID: <50DB5575.7020006@univ-mlv.fr> On 12/26/2012 07:38 PM, Brian Goetz wrote: > Let's try to separate some things here. > > There's lots of defending of into() because it is (a) useful and (b) > safe. That's all good. But let's see if we can think of these more > as functional requirements than as mandating a specific API (whether > one that happens to be already implemented, like into(), or the newly > proposed ones like toEveryKindOfCollection().) > > Into as currently implemented has many negatives, including: > - Adds conceptual and API surface area -- destinations have to > implement Destination, the semantics of into are weird and unique to into > - Will likely parallelize terribly > - Doesn't provide the user enough control over how the into'ing is > done (seq vs par, order-sensitive vs not) > > So let's step back and talk requirements. > > I think the only clear functional requirement is that it should be > easy to accumulate the result of a stream into a collection or similar > container. It should be easy to customize what kind of collection, > but also easy to say "give me a reasonable default." Additionally, the > performance characteristics should be transparent; users should be > able to figure out what's going to happen. > > There are lots of other nice-to-haves, such as: > - Minimize impact on Collection implementations > - Minimize magic/guessing about the user's intent > - Support destinations that aren't collections > - Minimize tight coupling of Stream API to existing Collection APIs > > The current into() fails on nearly all of these. Brian, you confound the concept of into() with it's current implementation. - Minimize impact on Collection implementations => fail because destination interface says that destination should have a method add(Stream) but this is not a requirement, destination can use already existing add/addAll - Minimize magic/guessing about the user's intent => fails, due to the implementation - Support destinations that aren't collections => works - Minimize tight coupling of Stream API to existing Collection APIs => part of the problem is point 1, the other part is that loosely couple will come with a price for users like requiring them to specify too many parameters for common use cases. for me the requirements are (in that order) - support classical collections in a way that is always thread safe - support destination that are not collections - minimize coupling with Collection API if you want to compare your tabulator with into, into has to be correctly specified, I think, it should be something like this: into(Supplier> supplier) with the interface Destination defined like this interface Destination> { boolean add(T element); boolean addAll(D destination); } example of usages: stream.into(ArrayList::new); // with Collection implements Destination> with that, I see your tabulator as a more fine grain API that into() but that requires users to send more information. R?mi From forax at univ-mlv.fr Wed Dec 26 11:57:07 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 26 Dec 2012 20:57:07 +0100 Subject: Into In-Reply-To: <50DB4BD0.1060800@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> <50DB2EB6.30700@cs.oswego.edu> <50DB3BA3.3000501@univ-mlv.fr> <50DB443E.5010800@oracle.com> <50DB4A6F.8060202@univ-mlv.fr> <50DB4BD0.1060800@oracle.com> Message-ID: <50DB5693.7000001@univ-mlv.fr> On 12/26/2012 08:11 PM, Brian Goetz wrote: >>> Since we can't read the user's mind about whether they care about >>> encounter order or not (e.g., they may use a List because there's no >>> Multiset implementation handy), I think we need to provide ways of >>> aggregating that let the user explicitly choose between >>> order-preserving aggregation and concurrent aggregation. I think >>> having the word "concurrent" in the code somewhere isn't a bad clue. >> >> This defeat one important purpose of the Stream API which is to be >> parallel/sequential agnostic from the user POV. > > Only to the extent that reality forces us to. The user has to declare > whether they care about encounter order vs arrival order, or > (equivalently) whether their reducers are associative or commutative. > The user *does* have to understand this, otherwise we lose many of the > benefits of parallelism by being forced to make bad assumptions. you can hide that by asking what the user want as result. > >> Here having a toList() that take care about concurrency if needed is >> very appealing. > > Only if the cost of this is not that performance sucks in surprising > ways. The performance of into() sucks in surprising ways. > into() sucks because it's currently implemented as sequential().into(). toList() doesn't require that and into() can be specified to avoid that. R?mi From brian.goetz at oracle.com Wed Dec 26 13:16:25 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Dec 2012 16:16:25 -0500 Subject: Proposed rename of Map.forEach(BiBlock block) In-Reply-To: <6712820CB52CFB4D842561213A77C05404C0C17CEE@GSCMAMP09EX.firmwide.corp.gs.com> References: <6712820CB52CFB4D842561213A77C05404C0C17CEE@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50DB6929.3080808@oracle.com> Makes sense to me. On 12/25/2012 10:34 PM, Raab, Donald wrote: > Can we rename the forEach(BiBlock block) method to forEachKeyValue(BiBlock block) on Map please? > > In GS Collections, our MapIterable interace extends RichIterable which extends Iterable. This is a choice which makes it consistent with Smalltalk. This results in us having a method forEach(Procedure) defined on all our Map implementations. This will also cause us to have a method forEach(Block block) defined when JDK 8 is released. Having a third overloaded forEach() method will cause a lot of confusion for us. > > https://github.com/goldmansachs/gs-collections/blob/master/collections-api/src/main/java/com/gs/collections/api/map/MapIterable.java#L33 > > Hopefully there are no plans to have Map re-defined as Map extends Iterable> in JDK 8. Otherwise this would result in forEach() having to be redefined as forEach>. > > For future reference, Trove defines these methods in THashMap: > > forEachKey(TObjectProcedure procedure) > forEachValue(TObjectProcedure procedure) > forEachEntry(TObjectObjectProcedure procedure) > > GS Collections defines these methods in MapIterable: > > forEach(Procedure procedure) // extended from RichIterable which ultimately extends Iterable > forEachKey(Procedure procedure) > forEachValue(Procedure procedure) > forEachKeyValue(Procedure2 procedure) > > I would suggest adding the other two methods forEachKey(Block) and forEachValue(Block) to Map for JDK 8 even though this will result in more casting at call sites for users of Trove and GS Collections since our methods will become overloads of the equivalent methods on Map. The current recommended workaround as I understand it will be to have our function types extend Block and BiBlock when JDK 8 is released. > > > > From brian.goetz at oracle.com Wed Dec 26 14:55:01 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Dec 2012 17:55:01 -0500 Subject: Spliterator update Message-ID: <50DB8045.4000103@oracle.com> After some offline discussions with Doug, we agreed that the getNaturalSplits method can be eliminated completely (rather than replaced with an isSplittable as suggested), and instead renaming split() as trySplit() which would be allowed to return null if the data source was not splittable. This eliminates the need to spec (and check) interactions between isSplittable and subsequent calls to split. This has been committed to lambda repo. From brian.goetz at oracle.com Wed Dec 26 14:56:13 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 26 Dec 2012 17:56:13 -0500 Subject: cumulate In-Reply-To: <50D60206.5070503@cs.oswego.edu> References: <50D4CC64.4090105@oracle.com> <50D4D120.6070303@univ-mlv.fr> <50D4D527.1010102@oracle.com> <50D60206.5070503@cs.oswego.edu> Message-ID: <50DB808D.8030206@oracle.com> Grabbed and committed. On 12/22/2012 1:55 PM, Doug Lea wrote: > On 12/21/12 16:31, Brian Goetz wrote: >> It's gone. (Well, not gone. Mercurial history is still there.) >> >> I propose this as the replacement: >> >> In Arrays: >> void parallelPrefix(T[], int offset, int length, BinaryOperator); >> void parallelPrefix(int[], int offset, int length, IntBinaryOperator); >> void parallelPrefix(long[], int offset, int length, >> LongBinaryOperator); >> void parallelPrefix(double[], int offset, int length, >> DoubleBinaryOperator); >> > > Actually, to be consistent with Arrays.sort (and other in-place methods > in Arrays), it should use fromIndex, toIndex. The "T" and long versions > pasted below. After Brian grabs and commits some stuff, it should all be in > place.... > > -Doug > > > > /** > * Cumulates in parallel each element of the given array in place, > * using the supplied function. For example if the array initially > * holds {@code [2, 1, 0, 3]} and the operation performs addition, > * then upon return the array holds {@code [2, 3, 3, 6]}. > * Parallel prefix computation is usually more efficient than > * sequential loops for large arrays. > * > * @param array the array, which is modified in-place by this method > * @param op the function to perform cumulations. The function > * must be amenable to left-to-right application through the > * elements of the array, as well as possible left-to-right > * application across segments of the array. > */ > public static void parallelPrefix(T[] array, BinaryOperator > op) { > if (array.length > 0) > new ArrayPrefixUtil.CumulateTask > (null, op, array, 0, array.length).invoke(); > } > > /** > * Performs {@link #parallelPrefix(Object[], BinaryOperator)} > * for the given subrange of the array. > * > * @param array the array > * @param fromIndex the index of the first element, inclusive > * @param toIndex the index of the last element, exclusive > * @param op the function to perform cumulations. > * @throws IllegalArgumentException if {@code fromIndex > toIndex} > * @throws ArrayIndexOutOfBoundsException > * if {@code fromIndex < 0} or {@code toIndex > array.length} > */ > public static void parallelPrefix(T[] array, int fromIndex, > int toIndex, > BinaryOperator op) { > checkFromToBounds(array.length, fromIndex, toIndex); > if (fromIndex < toIndex) > new ArrayPrefixUtil.CumulateTask > (null, op, array, fromIndex, toIndex).invoke(); > } > > > /** > * Cumulates in parallel each element of the given array in place, > * using the supplied function. For example if the array initially > * holds {@code [2, 1, 0, 3]} and the operation performs addition, > * then upon return the array holds {@code [2, 3, 3, 6]}. > * Parallel prefix computation is usually more efficient than > * sequential loops for large arrays. > * > * @param array the array, which is modified in-place by this method > * @param op the function to perform cumulations. The function > * must be amenable to left-to-right application through the > * elements of the array, as well as possible left-to-right > * application across segments of the array. > */ > public static void parallelPrefix(long[] array, LongBinaryOperator > op) { > if (array.length > 0) > new ArrayPrefixUtil.LongCumulateTask > (null, op, array, 0, array.length).invoke(); > } > > ... and others similarly... > > From brian.goetz at oracle.com Thu Dec 27 07:31:43 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 27 Dec 2012 10:31:43 -0500 Subject: Tabulators, reducers, etc Message-ID: <50DC69DF.3000001@oracle.com> Currently we have the following reduce-like methods: T reduce(T zero, BinaryOperator reducer); Optional reduce(BinaryOperator reducer); U reduce(U zero, BiFunction accumulator, BinaryOperator reducer); R mutableReduce(MutableReducer reducer); R mutableReduce(Supplier seedFactory, BiBlock accumulator, BiBlock reducer); R tabulate(Tabulator tabulator); R tabulate(ConcurrentTabulator tabulator); The first two are "real" reduce; the next three are really "fold" (mutable or not), the first tabulate() is trivial sugar around fold, and the last is really sugar around forEach. I think some naming consolidation is in order. From a user's perspective, these are all various flavors of the same thing, whether you call them reduce, summarize, tabulate, accumulate, aggregate, whatever. The argument for calling the first two reduce is that there is a historically consistent meaning for "reduce", so we might as well use the right word. But the others start to get farther afield from that consistent meaning, undermining this benefit. There are a few choices to make here about what we're shooting for. 1: base naming choice. We could: - Call the first two forms reduce, and call all the other forms something like "accumulate". - Call all the forms something like "accumulate". 2: merging vs concurrent. Since the concurrent accumulations are really based on a different primitive (forEach vs reduce), and have very different requirements on the user (operations had better be commutative; target containers had better be concurrent; user had better not care about encounter order), should these be named differently? 3: mutative vs pure functional. Should we distinguish between "pure" reduce and mutable accumulation? One option might be: use "reduce" for the purely functional forms, use accumulate/accumulateConcurrent for the others: T reduce(T zero, BinaryOperator reducer); Optional reduce(BinaryOperator reducer); U reduce(U zero, BiFunction accumulator, BinaryOperator reducer); R accumulate(Accumulator reducer); R accumulate(Supplier seedFactory, BiBlock accumulator, BiBlock reducer); R accumulateConcurrent(ConcurrentAccumulator tabulator); This would let us get rid of the Tabulator abstraction (it is identical to MutableReducer; both get renamed to Accumulator). Separately, with a small crowbar, we could simplify ConcurrentAccumulator down to fitting into existing SAMs, and the top-level abstraction could go away. We would continue to have the same set of combinators for making tabulators, and would likely have concurrent and not flavors for the Map ones (since there's a real choice for the user to make there.) From sam at sampullara.com Thu Dec 27 15:50:28 2012 From: sam at sampullara.com (Sam Pullara) Date: Thu, 27 Dec 2012 18:50:28 -0500 Subject: Tabulators, reducers, etc In-Reply-To: <50DC69DF.3000001@oracle.com> References: <50DC69DF.3000001@oracle.com> Message-ID: I really like this suggested API. I think it would be easier to digest with concrete examples that show that these choices are orthogonal and necessary . Sam On Thu, Dec 27, 2012 at 10:31 AM, Brian Goetz wrote: > > One option might be: use "reduce" for the purely functional forms, use > accumulate/**accumulateConcurrent for the others: > > T reduce(T zero, BinaryOperator reducer); > Optional reduce(BinaryOperator reducer); > U reduce(U zero, BiFunction accumulator, > BinaryOperator reducer); > > R accumulate(Accumulator reducer); > R accumulate(Supplier seedFactory, > BiBlock accumulator, > BiBlock reducer); > > R accumulateConcurrent(**ConcurrentAccumulator tabulator); > > This would let us get rid of the Tabulator abstraction (it is identical > to MutableReducer; both get renamed to Accumulator). Separately, with a > small crowbar, we could simplify ConcurrentAccumulator down to fitting into > existing SAMs, and the top-level abstraction could go away. > > We would continue to have the same set of combinators for making > tabulators, and would likely have concurrent and not flavors for the Map > ones (since there's a real choice for the user to make there.) > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121227/58587af2/attachment.html From brian.goetz at oracle.com Thu Dec 27 15:57:52 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 27 Dec 2012 18:57:52 -0500 Subject: Tabulators, reducers, etc In-Reply-To: References: <50DC69DF.3000001@oracle.com> Message-ID: <50DCE080.40402@oracle.com> Good idea, I'll try to write some of those up tomorrow. On 12/27/2012 6:50 PM, Sam Pullara wrote: > I really like this suggested API. I think it would be easier to digest > with concrete examples that show that these choices are orthogonal and > necessary . > > Sam > > On Thu, Dec 27, 2012 at 10:31 AM, Brian Goetz > wrote: > > One option might be: use "reduce" for the purely functional forms, > use accumulate/__accumulateConcurrent for the others: > > T reduce(T zero, BinaryOperator reducer); > Optional reduce(BinaryOperator reducer); > U reduce(U zero, BiFunction accumulator, > BinaryOperator reducer); > > R accumulate(Accumulator reducer); > R accumulate(Supplier seedFactory, > BiBlock accumulator, > BiBlock reducer); > > R accumulateConcurrent(__ConcurrentAccumulator > tabulator); > > This would let us get rid of the Tabulator abstraction (it is > identical to MutableReducer; both get renamed to Accumulator). > Separately, with a small crowbar, we could simplify > ConcurrentAccumulator down to fitting into existing SAMs, and the > top-level abstraction could go away. > > We would continue to have the same set of combinators for making > tabulators, and would likely have concurrent and not flavors for the > Map ones (since there's a real choice for the user to make there.) > From brian.goetz at oracle.com Thu Dec 27 16:47:36 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 27 Dec 2012 19:47:36 -0500 Subject: Tabulators, reducers, etc In-Reply-To: References: <50DC69DF.3000001@oracle.com> Message-ID: <50DCEC28.6050009@oracle.com> > I really like this suggested API. I think it would be easier to digest > with concrete examples that show that these choices are orthogonal and > necessary . > > Sam > > On Thu, Dec 27, 2012 at 10:31 AM, Brian Goetz > wrote: > > One option might be: use "reduce" for the purely functional forms, > use accumulate/__accumulateConcurrent for the others: > > T reduce(T zero, BinaryOperator reducer); "Compute the sum of the squares of the first 100 integers." int sumOfSquares = integers().map(x -> x*x) .limit(100) .reduce(0, Integer::sum); where integers() generates an infinite IntStream (or maybe one that stops at MAX_VALUE.) > Optional reduce(BinaryOperator reducer); "How tall is the tallest person?" Optional tallest = people.map(Person::getHeight) .reduce(greaterOf(naturalOrder())) where Comparators.naturalOrder -> Comparator Comparators.greaterOf(Comparator) -> BinaryOperator "Who is the tallest person" Optional tallest = people.reduce(greaterOf(comparing(Person::getHeight))); > U reduce(U zero, BiFunction accumulator, > BinaryOperator reducer); "How many pages are there in this stack of documents" int pageCount = documents.reduce(0, (c, d) -> c + d.pages(), Integer::sum); While this can be represented as a map+reduce, sometimes the three-arg form provides more efficiency or flexibility. For example, Joe came up with this one today -- perform String.compare on two strings of equal length, in parallel. You could do this: intRange(0, s1.length()) .parallel() .map(i -> cmp(i)) .reduce(0, (l, r) -> (l != 0) ? l : r); where cmp(i) = Character.compare(s1.charAt(i), s2.charAt(i)); But, using the three-arg form, we can optimize away the irrelevant comparisons: intRange(0, s1.length()) .parallel() .reduce(0, (l, i) -> (l != 0) ? l : cmp(i), (l, r) -> (l != 0) ? l : r)); > R accumulate(Supplier seedFactory, > BiBlock accumulator, > BiBlock reducer); This is the mutable version of the previous form, where instead of a seed value, there is a factory to create mutable containers, and instead of functions to compute a new aggregation result, we fold new values into the container. Examples: ArrayList asList = strings.parallel() .accumulate(ArrayList::new, ArrayList::add, // add(t) ArrayList::addAll) // addAll(Collection) String concatted = strings.parallel() .accumulate(StringBuilder::new, StringBuilder::add, // add(s) StringBuilder::add) // add(StringBuilder) .toString(); BitSet bs = numbers.parallel() .aggregate(BitSet::new, BitSet::set, BitSet::or); > R accumulate(Accumulator reducer); This one is a convenient form of the previous one, where instead of specifying three lambdas, we tie them together so they can be reused and/or composed. Accumulator.OfInt TO_BIT_SET = Accumulators.make(BitSet::new, BitSet::set, BitSet::or); BitSet bs = numbers.accumulate(TO_BIT_SET); The reuse part is nice, but the composition part is even more important. With an abstraction for Accumulator, all of our aggregations like groupBy, reduceBy, partition, mapTo, etc, are just accumulations, and its trivial to cascade them. For example: "Transactions by (buyer, seller)" Map> map = txns.accumulate(groupBy(Txn::buyer, groupBy(Txn::seller)); The inner groupBy returns an Accumulator>; the outer groupBy treats this simply as a downstream reduction, and produces a new Accumulator. "Largest transaction by (buyer, seller)" Map> m = txns.accumulate(groupBy(Txn::buyer, groupBy(Txn::seller, greaterOf(comparing(Txn::amount))) "Profitable and unprofitable transactions by salesman" Map[]> map = txns.groupBy(Txn::seller, partition(t -> t.margin() > X))); Here, partition() returns an Accumulator[]>. > R accumulateConcurrent(__ConcurrentAccumulator > tabulator); All the above accumulations were order-preserving, and some used mutable but not shared containers. This means that containers have to be merged, which often involves nontrivial copying cost. If you have a concurrent container, AND you don't care about encounter order, AND your reduction functions are commutative (not just associative), you have another choice: shovel things into a concurrent data structure, and hope its contention management is less expensive than merging. Note that this shoveling is really just forEach, not any form of reduce. The accumulateConcurrent (open to alternate names) makes it clear that you are choosing this mode that has different semantics. However, given a suitable container, all the same aggregations can be done as concurrent/mutative/order-ignoring/commutative accumulations. So: "Transactions by (buyer, seller)" ConcurrentMap> map = txns.accumulateConcurrent(groupBy(ConcurrentHashMap::new, Txn::buyer, groupBy(Txn::seller)); The preference for concurrent in the method name is that without it, it wouldn't be possible to tell whether a given accumulation is concurrent or not. Because the semantics are so different, I think this is a choice we shouldn't encourage brushing under the rug. Finding another name for "accumulateConcurrent" would also be OK, maybe one that has "forEach" in it, like: map = txns.forEachInto(ConcurrentHashMap::new, groupBy(Txn::buyer)) From brian.goetz at oracle.com Thu Dec 27 18:23:42 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 27 Dec 2012 21:23:42 -0500 Subject: Tabulators -- a catalog Message-ID: <50DD02AE.3040803@oracle.com> Here's a catalog of the currently implemented Tabulators. 1. The groupBy family. Currently, there are 16 of these: { map vs mapMulti } x { explicit factories or not } x { reduce forms } where the reduce forms are: - nothing (classic groupBy, group into collections) - MutableReducer - BinaryOperator // straight reduce - Function + BinaryOperator // map-reduce The MutableReducer form is what we have been calling Accumulator today; since all the tabulators are MutableReducer, the second form is what allows multi-level tabulations. Q: Does the mapMulti variant carry its weight? (It's not a lot of extra code; these extra 8 methods are in total less than 50 lines of code.) Q: Should the mapMulti variant be called something else, like groupByMulti? Q: The first reduce form is classic groupBy; the others are group+reduce. Should they be called groupedReduce / groupedAccumulate for clarity? Examples: // map + no explicit factories + mutable reduce form Map groupBy(Function classifier, Accumulator downstream) // map + explicit factories + classic reduce , M extends Map> Tabulator groupBy(Function classifier, Supplier mapFactory, Supplier rowFactory) { 2. The mappedTo family. These take a Stream and a function T->U and produce a MapLikeThingy. Four forms: // basic Tabulator> mappedTo(Function mapper) // with merge function to handle duplicates Tabulator> mappedTo(Function mapper, BinaryOperator mergeFunction) // with map factory > Tabulator mappedTo(Function mapper, Supplier mapSupplier) // with both factory and merge function > Tabulator mappedTo(Function mapper, BinaryOperator mergeFunction, Supplier mapSupplier) Q: is the name good enough? Q: what should be the default merging behavior for the forms without an explicit merger? Throw? 3. Partition. Partitions a stream according to a predicate. Results always are a two-element array of something. Five forms: // Basic Tabulator[]> partition(Predicate predicate) // Explicit factory > Tabulator partition(Predicate predicate, Supplier rowFactory) // Partitioned mutable reduce Tabulator partition(Predicate predicate, MutableReducer downstream) // Partitioned functional reduce Tabulator partition(Predicate predicate, T zero, BinaryOperator reducer) // Partitioned functional map-reduce Tabulator partition(Predicate predicate, T zero, Function mapper, BinaryOperator reducer) All of these implement MutableReducer/Accumulator/Tabulator, which means any are suitable for use as the downstream reducer, allowing all of these to be composed with each other. (Together all of these are about 300 lines of relatively straight-forward code.) More? Fewer? Different? From paul.sandoz at oracle.com Fri Dec 28 05:00:39 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 28 Dec 2012 14:00:39 +0100 Subject: Into In-Reply-To: <50D5EAE4.2080208@oracle.com> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> Message-ID: On Dec 22, 2012, at 6:16 PM, Brian Goetz wrote: > Right. You want to do the upstream stuff in parallel, then you want to do the downstream stuff (a) serially, (b) in the current thread, and probably (c) in encounter order. > > So, assume for sake of discussion that we have some form of .toList(), whether as a "native" operation or some sort of reduce/combine/tabulate. Then you can say: > > parallelStream()...toList().forEach(...) > > and the list-building will happen in parallel and then forEach can help sequentially. > > Given that, is there any reason left for sequential()? > Only if we change it so that it is a partial barrier. Paul. From paul.sandoz at oracle.com Fri Dec 28 05:48:32 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 28 Dec 2012 14:48:32 +0100 Subject: Into In-Reply-To: <50DB5575.7020006@univ-mlv.fr> References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> <50DB2EB6.30700@cs.oswego.edu> <50DB3BA3.3000501@univ-mlv.fr> <50DB443E.5010800@oracle.com> <50DB5575.7020006@univ-mlv.fr> Message-ID: Currently we can do this: s.mutableReduce(Reducers.intoCollection(ArrayList::new)) into(...) could be sugar: > into(Supplier collectionFactory) { mutableReduce(Reducers.intoCollection(collectionFactory)); } That works OK for collections but not for stuff like StringJoiner and StringBuilder, so using your proposed Destination: public static> MutableReducer intoDestination(Supplier destinationFactory) { return reducer(destinationFactory, (BiBlock) Destination::add, (BiBlock) Destination::addAll); } > into(Supplier destinationFactory) { mutableReduce(Reducers.intoDestination(destinationFactory)); } Paul. On Dec 26, 2012, at 8:52 PM, Remi Forax wrote: > On 12/26/2012 07:38 PM, Brian Goetz wrote: >> Let's try to separate some things here. >> >> There's lots of defending of into() because it is (a) useful and (b) safe. That's all good. But let's see if we can think of these more as functional requirements than as mandating a specific API (whether one that happens to be already implemented, like into(), or the newly proposed ones like toEveryKindOfCollection().) >> >> Into as currently implemented has many negatives, including: >> - Adds conceptual and API surface area -- destinations have to implement Destination, the semantics of into are weird and unique to into >> - Will likely parallelize terribly >> - Doesn't provide the user enough control over how the into'ing is done (seq vs par, order-sensitive vs not) >> >> So let's step back and talk requirements. >> >> I think the only clear functional requirement is that it should be easy to accumulate the result of a stream into a collection or similar container. It should be easy to customize what kind of collection, but also easy to say "give me a reasonable default." Additionally, the performance characteristics should be transparent; users should be able to figure out what's going to happen. >> >> There are lots of other nice-to-haves, such as: >> - Minimize impact on Collection implementations >> - Minimize magic/guessing about the user's intent >> - Support destinations that aren't collections >> - Minimize tight coupling of Stream API to existing Collection APIs >> >> The current into() fails on nearly all of these. > > Brian, you confound the concept of into() with it's current implementation. > - Minimize impact on Collection implementations > => fail because destination interface says that destination should have a method add(Stream) > but this is not a requirement, destination can use already existing add/addAll > - Minimize magic/guessing about the user's intent > => fails, due to the implementation > - Support destinations that aren't collections > => works > - Minimize tight coupling of Stream API to existing Collection APIs > => part of the problem is point 1, the other part is that loosely couple will come with a price for users > like requiring them to specify too many parameters for common use cases. > > for me the requirements are (in that order) > - support classical collections in a way that is always thread safe > - support destination that are not collections > - minimize coupling with Collection API > > if you want to compare your tabulator with into, into has to be correctly specified, > I think, it should be something like this: > into(Supplier> supplier) > with the interface Destination defined like this > interface Destination> { > boolean add(T element); > boolean addAll(D destination); > } > > example of usages: > stream.into(ArrayList::new); // with Collection implements Destination> > > with that, I see your tabulator as a more fine grain API that into() but that requires users to send more information. > > R?mi > From forax at univ-mlv.fr Fri Dec 28 06:07:16 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 28 Dec 2012 15:07:16 +0100 Subject: Into In-Reply-To: References: <50D4A181.3050300@oracle.com> <50D4D90D.4060702@oracle.com> <50D5EAE4.2080208@oracle.com> <50D5EE04.1010602@univ-mlv.fr> <50D5F1E4.7020606@oracle.com> <50D6164D.40307@cs.oswego.edu> <50D63317.7020306@univ-mlv.fr> <50D6F2D3.1030409@cs.oswego.edu> <50D71697.5020008@cs.oswego.edu> <50D7243E.5000803@oracle.com> <50DA3A23.8040101@univ-mlv.fr> <50DB12E1.8050408@cs.oswego.edu> <50DB165E.1040200@univ-mlv.fr> <50DB1A77.2080305@cs.oswego.edu> <50DB2B48.2040109@univ-mlv.fr> <50DB2EB6.30700@cs.oswego.edu> <50DB3BA3.3000501@univ-mlv.fr> <50DB443E.5010800@oracle.com> <50DB5575.7020006@univ-mlv.fr> Message-ID: <50DDA794.1060608@univ-mlv.fr> On 12/28/2012 02:48 PM, Paul Sandoz wrote: > Currently we can do this: > > s.mutableReduce(Reducers.intoCollection(ArrayList::new)) > > into(...) could be sugar: > > > C into(Supplier collectionFactory) { > mutableReduce(Reducers.intoCollection(collectionFactory)); > } > > That works OK for collections but not for stuff like StringJoiner and StringBuilder, so using your proposed Destination: > > public static> > MutableReducer intoDestination(Supplier destinationFactory) { > return reducer(destinationFactory, (BiBlock) Destination::add, (BiBlock) Destination::addAll); > } > > > > into(Supplier destinationFactory) { > mutableReduce(Reducers.intoDestination(destinationFactory)); > } yes, into() is a sugar for a merge reducer. The doc has to be changed to say that the supplier can be called more than once that is surprising if you don't think to the parallel case. The only open issue now is to figure out if we need a tabulator/mutable reducer/accumulator object or not. It seems we need it for chaining things groupBy of groupBy, etc. > > Paul. R?mi > > On Dec 26, 2012, at 8:52 PM, Remi Forax wrote: > >> On 12/26/2012 07:38 PM, Brian Goetz wrote: >>> Let's try to separate some things here. >>> >>> There's lots of defending of into() because it is (a) useful and (b) safe. That's all good. But let's see if we can think of these more as functional requirements than as mandating a specific API (whether one that happens to be already implemented, like into(), or the newly proposed ones like toEveryKindOfCollection().) >>> >>> Into as currently implemented has many negatives, including: >>> - Adds conceptual and API surface area -- destinations have to implement Destination, the semantics of into are weird and unique to into >>> - Will likely parallelize terribly >>> - Doesn't provide the user enough control over how the into'ing is done (seq vs par, order-sensitive vs not) >>> >>> So let's step back and talk requirements. >>> >>> I think the only clear functional requirement is that it should be easy to accumulate the result of a stream into a collection or similar container. It should be easy to customize what kind of collection, but also easy to say "give me a reasonable default." Additionally, the performance characteristics should be transparent; users should be able to figure out what's going to happen. >>> >>> There are lots of other nice-to-haves, such as: >>> - Minimize impact on Collection implementations >>> - Minimize magic/guessing about the user's intent >>> - Support destinations that aren't collections >>> - Minimize tight coupling of Stream API to existing Collection APIs >>> >>> The current into() fails on nearly all of these. >> Brian, you confound the concept of into() with it's current implementation. >> - Minimize impact on Collection implementations >> => fail because destination interface says that destination should have a method add(Stream) >> but this is not a requirement, destination can use already existing add/addAll >> - Minimize magic/guessing about the user's intent >> => fails, due to the implementation >> - Support destinations that aren't collections >> => works >> - Minimize tight coupling of Stream API to existing Collection APIs >> => part of the problem is point 1, the other part is that loosely couple will come with a price for users >> like requiring them to specify too many parameters for common use cases. >> >> for me the requirements are (in that order) >> - support classical collections in a way that is always thread safe >> - support destination that are not collections >> - minimize coupling with Collection API >> >> if you want to compare your tabulator with into, into has to be correctly specified, >> I think, it should be something like this: >> into(Supplier> supplier) >> with the interface Destination defined like this >> interface Destination> { >> boolean add(T element); >> boolean addAll(D destination); >> } >> >> example of usages: >> stream.into(ArrayList::new); // with Collection implements Destination> >> >> with that, I see your tabulator as a more fine grain API that into() but that requires users to send more information. >> >> R?mi >> From forax at univ-mlv.fr Fri Dec 28 06:39:28 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 28 Dec 2012 15:39:28 +0100 Subject: Tabulators -- a catalog In-Reply-To: <50DD02AE.3040803@oracle.com> References: <50DD02AE.3040803@oracle.com> Message-ID: <50DDAF20.9030402@univ-mlv.fr> On 12/28/2012 03:23 AM, Brian Goetz wrote: > Here's a catalog of the currently implemented Tabulators. [...] > 3. Partition. Partitions a stream according to a predicate. Results > always are a two-element array of something. Five forms: > > // Basic > Tabulator[]> > partition(Predicate predicate) > > // Explicit factory > > Tabulator > partition(Predicate predicate, > Supplier rowFactory) > > // Partitioned mutable reduce > Tabulator > partition(Predicate predicate, > MutableReducer downstream) > > // Partitioned functional reduce > Tabulator > partition(Predicate predicate, > T zero, > BinaryOperator reducer) > > // Partitioned functional map-reduce > Tabulator > partition(Predicate predicate, > T zero, > Function mapper, > BinaryOperator reducer) You can't create an array of T (C, D) safely, so casting an array of Object to an array of T is maybe acceptable if you control all the access to that array like in collections, but here you export it. R?mi From brian.goetz at oracle.com Fri Dec 28 06:52:30 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 09:52:30 -0500 Subject: Tabulators -- a catalog In-Reply-To: <50DDAF20.9030402@univ-mlv.fr> References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> Message-ID: <50DDB22E.2020309@oracle.com> > You can't create an array of T (C, D) safely, so casting an array of > Object to an array of T is maybe acceptable if you control all the > access to that array like in collections, but here you export it. We do control all access during creation. We instantiate the array, and then we only stick things in it that are statically typed to fit. Once the user gets their hands on it, that's a different story... From forax at univ-mlv.fr Fri Dec 28 07:10:49 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 28 Dec 2012 16:10:49 +0100 Subject: Tabulators -- a catalog In-Reply-To: <50DDB22E.2020309@oracle.com> References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> <50DDB22E.2020309@oracle.com> Message-ID: <50DDB679.30503@univ-mlv.fr> On 12/28/2012 03:52 PM, Brian Goetz wrote: >> You can't create an array of T (C, D) safely, so casting an array of >> Object to an array of T is maybe acceptable if you control all the >> access to that array like in collections, but here you export it. > > We do control all access during creation. We instantiate the array, > and then we only stick things in it that are statically typed to fit. > Once the user gets their hands on it, that's a different story... to be crystal clear, this throws a CCE at runtime String[] strings = Tabulators.partition(s -> s.length() %2 == 0, "", (s1, s2) -> s1 + s2).makeAccumulator(); R?mi From brian.goetz at oracle.com Fri Dec 28 07:28:17 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 10:28:17 -0500 Subject: Tabulators -- a catalog In-Reply-To: <50DDAF20.9030402@univ-mlv.fr> References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> Message-ID: <50DDBA91.1090106@oracle.com> So the thing to do here is return Object[] instead of T[] / D[]. Sad, but not terrible. Not important enough to have the user pass in a factory. For want of a Pair... On 12/28/2012 9:39 AM, Remi Forax wrote: > On 12/28/2012 03:23 AM, Brian Goetz wrote: >> Here's a catalog of the currently implemented Tabulators. > > [...] > >> 3. Partition. Partitions a stream according to a predicate. Results >> always are a two-element array of something. Five forms: >> >> // Basic >> Tabulator[]> >> partition(Predicate predicate) >> >> // Explicit factory >> > Tabulator >> partition(Predicate predicate, >> Supplier rowFactory) >> >> // Partitioned mutable reduce >> Tabulator >> partition(Predicate predicate, >> MutableReducer downstream) >> >> // Partitioned functional reduce >> Tabulator >> partition(Predicate predicate, >> T zero, >> BinaryOperator reducer) >> >> // Partitioned functional map-reduce >> Tabulator >> partition(Predicate predicate, >> T zero, >> Function mapper, >> BinaryOperator reducer) > > You can't create an array of T (C, D) safely, so casting an array of > Object to an array of T is maybe acceptable if you control all the > access to that array like in collections, but here you export it. > > R?mi > From sam at sampullara.com Fri Dec 28 07:30:43 2012 From: sam at sampullara.com (Sam Pullara) Date: Fri, 28 Dec 2012 10:30:43 -0500 Subject: Tabulators -- a catalog In-Reply-To: <50DDBA91.1090106@oracle.com> References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> <50DDBA91.1090106@oracle.com> Message-ID: Remind me against what exactly is the issue with having a Pair class? Returning an Object[] here is pretty awful isn't it? Sam On Dec 28, 2012, at 10:28 AM, Brian Goetz wrote: > So the thing to do here is return Object[] instead of T[] / D[]. Sad, but not terrible. Not important enough to have the user pass in a factory. For want of a Pair... > > On 12/28/2012 9:39 AM, Remi Forax wrote: >> On 12/28/2012 03:23 AM, Brian Goetz wrote: >>> Here's a catalog of the currently implemented Tabulators. >> >> [...] >> >>> 3. Partition. Partitions a stream according to a predicate. Results >>> always are a two-element array of something. Five forms: >>> >>> // Basic >>> Tabulator[]> >>> partition(Predicate predicate) >>> >>> // Explicit factory >>> > Tabulator >>> partition(Predicate predicate, >>> Supplier rowFactory) >>> >>> // Partitioned mutable reduce >>> Tabulator >>> partition(Predicate predicate, >>> MutableReducer downstream) >>> >>> // Partitioned functional reduce >>> Tabulator >>> partition(Predicate predicate, >>> T zero, >>> BinaryOperator reducer) >>> >>> // Partitioned functional map-reduce >>> Tabulator >>> partition(Predicate predicate, >>> T zero, >>> Function mapper, >>> BinaryOperator reducer) >> >> You can't create an array of T (C, D) safely, so casting an array of >> Object to an array of T is maybe acceptable if you control all the >> access to that array like in collections, but here you export it. >> >> R?mi >> From brian.goetz at oracle.com Fri Dec 28 07:38:33 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 10:38:33 -0500 Subject: Tabulators -- a catalog In-Reply-To: References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> <50DDBA91.1090106@oracle.com> Message-ID: <50DDBCF9.9000208@oracle.com> See this thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-March/003973.html Some excerpts: Kevin: Pair is only a partial, flawed solution to a special case (n=2) of a very significant problem: the disproportionate complexity of creating value types in Java. I support addressing the underlying problem in Java 8, and not littering the API with dead-end solutions like Pair. CrazyBob: Please don't add Pair. It should never be used in APIs. Adding it to java.util will enable and even encourage its use in APIs. The damage done to future Java APIs will be far worse than a few duplicate copies of Pair (I don't even see that many). I think we'll have a hard time finding use cases to back up this addition. Kevin: FYI, here are some examples of types you can look forward to seeing in Java code near you when you have a Pair class available: Pair,List>>> Map>>> Map>>> FJ.EmitFn>>>>> Processor>,Pair,List>>,List>> DoFn>>>,Pair>>>> These are all real examples found in real, live production code (simplified a little). There were only a scant few examples of this... caliber... that did not involve Pair. On 12/28/2012 10:30 AM, Sam Pullara wrote: > Remind me against what exactly is the issue with having a Pair class? Returning an Object[] here is pretty awful isn't it? > > Sam > > On Dec 28, 2012, at 10:28 AM, Brian Goetz wrote: > >> So the thing to do here is return Object[] instead of T[] / D[]. Sad, but not terrible. Not important enough to have the user pass in a factory. For want of a Pair... >> >> On 12/28/2012 9:39 AM, Remi Forax wrote: >>> On 12/28/2012 03:23 AM, Brian Goetz wrote: >>>> Here's a catalog of the currently implemented Tabulators. >>> >>> [...] >>> >>>> 3. Partition. Partitions a stream according to a predicate. Results >>>> always are a two-element array of something. Five forms: >>>> >>>> // Basic >>>> Tabulator[]> >>>> partition(Predicate predicate) >>>> >>>> // Explicit factory >>>> > Tabulator >>>> partition(Predicate predicate, >>>> Supplier rowFactory) >>>> >>>> // Partitioned mutable reduce >>>> Tabulator >>>> partition(Predicate predicate, >>>> MutableReducer downstream) >>>> >>>> // Partitioned functional reduce >>>> Tabulator >>>> partition(Predicate predicate, >>>> T zero, >>>> BinaryOperator reducer) >>>> >>>> // Partitioned functional map-reduce >>>> Tabulator >>>> partition(Predicate predicate, >>>> T zero, >>>> Function mapper, >>>> BinaryOperator reducer) >>> >>> You can't create an array of T (C, D) safely, so casting an array of >>> Object to an array of T is maybe acceptable if you control all the >>> access to that array like in collections, but here you export it. >>> >>> R?mi >>> > From forax at univ-mlv.fr Fri Dec 28 07:46:16 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 28 Dec 2012 16:46:16 +0100 Subject: Tabulators -- a catalog In-Reply-To: <50DDBA91.1090106@oracle.com> References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> <50DDBA91.1090106@oracle.com> Message-ID: <50DDBEC8.4080602@univ-mlv.fr> On 12/28/2012 04:28 PM, Brian Goetz wrote: > So the thing to do here is return Object[] instead of T[] / D[]. Sad, > but not terrible. Not important enough to have the user pass in a > factory. For want of a Pair... The other solution is to send a j.u.List with a specific non mutable implementation able to store only two elements. R?mi > > On 12/28/2012 9:39 AM, Remi Forax wrote: >> On 12/28/2012 03:23 AM, Brian Goetz wrote: >>> Here's a catalog of the currently implemented Tabulators. >> >> [...] >> >>> 3. Partition. Partitions a stream according to a predicate. Results >>> always are a two-element array of something. Five forms: >>> >>> // Basic >>> Tabulator[]> >>> partition(Predicate predicate) >>> >>> // Explicit factory >>> > Tabulator >>> partition(Predicate predicate, >>> Supplier rowFactory) >>> >>> // Partitioned mutable reduce >>> Tabulator >>> partition(Predicate predicate, >>> MutableReducer downstream) >>> >>> // Partitioned functional reduce >>> Tabulator >>> partition(Predicate predicate, >>> T zero, >>> BinaryOperator reducer) >>> >>> // Partitioned functional map-reduce >>> Tabulator >>> partition(Predicate predicate, >>> T zero, >>> Function mapper, >>> BinaryOperator reducer) >> >> You can't create an array of T (C, D) safely, so casting an array of >> Object to an array of T is maybe acceptable if you control all the >> access to that array like in collections, but here you export it. >> >> R?mi >> From brian.goetz at oracle.com Fri Dec 28 07:50:25 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 10:50:25 -0500 Subject: Tabulators -- a catalog In-Reply-To: <50DDBEC8.4080602@univ-mlv.fr> References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> <50DDBA91.1090106@oracle.com> <50DDBEC8.4080602@univ-mlv.fr> Message-ID: <50DDBFC1.4030103@oracle.com> Seems like overkill :( On 12/28/2012 10:46 AM, Remi Forax wrote: > On 12/28/2012 04:28 PM, Brian Goetz wrote: >> So the thing to do here is return Object[] instead of T[] / D[]. Sad, >> but not terrible. Not important enough to have the user pass in a >> factory. For want of a Pair... > > The other solution is to send a j.u.List with a specific non mutable > implementation able to store only two elements. > > R?mi > >> >> On 12/28/2012 9:39 AM, Remi Forax wrote: >>> On 12/28/2012 03:23 AM, Brian Goetz wrote: >>>> Here's a catalog of the currently implemented Tabulators. >>> >>> [...] >>> >>>> 3. Partition. Partitions a stream according to a predicate. Results >>>> always are a two-element array of something. Five forms: >>>> >>>> // Basic >>>> Tabulator[]> >>>> partition(Predicate predicate) >>>> >>>> // Explicit factory >>>> > Tabulator >>>> partition(Predicate predicate, >>>> Supplier rowFactory) >>>> >>>> // Partitioned mutable reduce >>>> Tabulator >>>> partition(Predicate predicate, >>>> MutableReducer downstream) >>>> >>>> // Partitioned functional reduce >>>> Tabulator >>>> partition(Predicate predicate, >>>> T zero, >>>> BinaryOperator reducer) >>>> >>>> // Partitioned functional map-reduce >>>> Tabulator >>>> partition(Predicate predicate, >>>> T zero, >>>> Function mapper, >>>> BinaryOperator reducer) >>> >>> You can't create an array of T (C, D) safely, so casting an array of >>> Object to an array of T is maybe acceptable if you control all the >>> access to that array like in collections, but here you export it. >>> >>> R?mi >>> > From brian.goetz at oracle.com Fri Dec 28 08:20:03 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 11:20:03 -0500 Subject: Cancelation -- use cases Message-ID: <50DDC6B3.8050805@oracle.com> I've been working through some alternatives for cancellation support in infinite streams. Looking to gather some use case background to help evaluate the alternatives. In the serial case, the "gate" approach works fine -- after some criteria transpires, stop sending elements downstream. The pipeline flushes the elements it has, and completes early. In the parallel unordered case, the gate approach similarly works fine -- after the cancelation criteria occurs, no new splits are created, and existing splits dispense no more elements. The computation similarly quiesces after elements currently being processed are completed, possibly along with any up-tree merging to combine results. It is the parallel ordered case that is tricky. Supposing we partition a stream into (a1,a2,a3), (a4,a5,a6) And suppose further we happen to be processing a5 when the bell goes off. Do we want to wait for all a_i, i<5, to finish before letting the computation quiesce? My gut says: for the things we intend to cancel, most of them will be order-insensitive anyway. Things like: - Find the best possible move after thinking for 5 seconds - Find the first solution that is better than X - Gather solutions until we have 100 of them I believe the key use case for cancelation here will be when we are chewing on potentially infinite streams of events (probably backed by IO) where we want to chew until we're asked to shut down, and want to get as much parallelism as we can cheaply. Which suggests to me the intersection between order-sensitive stream pipelines and cancelable stream pipelines is going to be pretty small indeed. Anyone want to add to this model of use cases for cancelation? From Donald.Raab at gs.com Fri Dec 28 08:29:02 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Fri, 28 Dec 2012 11:29:02 -0500 Subject: Tabulators -- a catalog In-Reply-To: <50DDBFC1.4030103@oracle.com> References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> <50DDBA91.1090106@oracle.com> <50DDBEC8.4080602@univ-mlv.fr> <50DDBFC1.4030103@oracle.com> Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17E62@GSCMAMP09EX.firmwide.corp.gs.com> This is the route we went. interface PartitionCollection { Collection getPositive(); Collection getNegative(); } More specific than Pair. Less mutative, flexible and annoying than Collection[]. > -----Original Message----- > From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda-libs- > spec-experts-bounces at openjdk.java.net] On Behalf Of Brian Goetz > Sent: Friday, December 28, 2012 10:50 AM > To: Remi Forax > Cc: lambda-libs-spec-experts at openjdk.java.net > Subject: Re: Tabulators -- a catalog > > Seems like overkill :( > > On 12/28/2012 10:46 AM, Remi Forax wrote: > > On 12/28/2012 04:28 PM, Brian Goetz wrote: > >> So the thing to do here is return Object[] instead of T[] / D[]. Sad, > >> but not terrible. Not important enough to have the user pass in a > >> factory. For want of a Pair... > > > > The other solution is to send a j.u.List with a specific non mutable > > implementation able to store only two elements. > > > > R?mi > > > >> > >> On 12/28/2012 9:39 AM, Remi Forax wrote: > >>> On 12/28/2012 03:23 AM, Brian Goetz wrote: > >>>> Here's a catalog of the currently implemented Tabulators. > >>> > >>> [...] > >>> > >>>> 3. Partition. Partitions a stream according to a predicate. > >>>> Results always are a two-element array of something. Five forms: > >>>> > >>>> // Basic > >>>> Tabulator[]> > >>>> partition(Predicate predicate) > >>>> > >>>> // Explicit factory > >>>> > Tabulator > >>>> partition(Predicate predicate, > >>>> Supplier rowFactory) > >>>> > >>>> // Partitioned mutable reduce > >>>> Tabulator > >>>> partition(Predicate predicate, > >>>> MutableReducer downstream) > >>>> > >>>> // Partitioned functional reduce > >>>> Tabulator > >>>> partition(Predicate predicate, > >>>> T zero, > >>>> BinaryOperator reducer) > >>>> > >>>> // Partitioned functional map-reduce > >>>> Tabulator > >>>> partition(Predicate predicate, > >>>> T zero, > >>>> Function mapper, > >>>> BinaryOperator reducer) > >>> > >>> You can't create an array of T (C, D) safely, so casting an array of > >>> Object to an array of T is maybe acceptable if you control all the > >>> access to that array like in collections, but here you export it. > >>> > >>> R?mi > >>> > > From joe.bowbeer at gmail.com Fri Dec 28 08:33:57 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Fri, 28 Dec 2012 08:33:57 -0800 Subject: Cancelation -- use cases In-Reply-To: <50DDC6B3.8050805@oracle.com> References: <50DDC6B3.8050805@oracle.com> Message-ID: I think of cancellation as a terminal condition that indicates the results are not needed: The user pushes the cancel button: stop and return control to the user. What you are describing in your a5 case seems like a notification or interrupt that may need to be handled cooperatively in order to preserve the work that has been completed. On Fri, Dec 28, 2012 at 8:20 AM, Brian Goetz wrote: > I've been working through some alternatives for cancellation support in > infinite streams. Looking to gather some use case background to help > evaluate the alternatives. > > In the serial case, the "gate" approach works fine -- after some criteria > transpires, stop sending elements downstream. The pipeline flushes the > elements it has, and completes early. > > In the parallel unordered case, the gate approach similarly works fine -- > after the cancelation criteria occurs, no new splits are created, and > existing splits dispense no more elements. The computation similarly > quiesces after elements currently being processed are completed, possibly > along with any up-tree merging to combine results. > > It is the parallel ordered case that is tricky. Supposing we partition a > stream into > (a1,a2,a3), (a4,a5,a6) > > And suppose further we happen to be processing a5 when the bell goes off. > Do we want to wait for all a_i, i<5, to finish before letting the > computation quiesce? > > My gut says: for the things we intend to cancel, most of them will be > order-insensitive anyway. Things like: > > - Find the best possible move after thinking for 5 seconds > - Find the first solution that is better than X > - Gather solutions until we have 100 of them > > I believe the key use case for cancelation here will be when we are > chewing on potentially infinite streams of events (probably backed by IO) > where we want to chew until we're asked to shut down, and want to get as > much parallelism as we can cheaply. Which suggests to me the intersection > between order-sensitive stream pipelines and cancelable stream pipelines is > going to be pretty small indeed. > > Anyone want to add to this model of use cases for cancelation? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121228/86b19b8b/attachment.html From sam at sampullara.com Fri Dec 28 08:37:35 2012 From: sam at sampullara.com (Sam Pullara) Date: Fri, 28 Dec 2012 11:37:35 -0500 Subject: Cancelation -- use cases In-Reply-To: <50DDC6B3.8050805@oracle.com> References: <50DDC6B3.8050805@oracle.com> Message-ID: I can see that if you were doing an expensive calculation that is an infinite series of terms and you cancel after some condition you may have to keep all the terms that match before the condition. Maybe something like calculating Pi that stops after the term is less than a certain size would be a reasonable example? That could be done in parallel but would need to gather all the terms up to the cut off. Sam On Dec 28, 2012, at 11:20 AM, Brian Goetz wrote: > I've been working through some alternatives for cancellation support in infinite streams. Looking to gather some use case background to help evaluate the alternatives. > > In the serial case, the "gate" approach works fine -- after some criteria transpires, stop sending elements downstream. The pipeline flushes the elements it has, and completes early. > > In the parallel unordered case, the gate approach similarly works fine -- after the cancelation criteria occurs, no new splits are created, and existing splits dispense no more elements. The computation similarly quiesces after elements currently being processed are completed, possibly along with any up-tree merging to combine results. > > It is the parallel ordered case that is tricky. Supposing we partition a stream into > (a1,a2,a3), (a4,a5,a6) > > And suppose further we happen to be processing a5 when the bell goes off. Do we want to wait for all a_i, i<5, to finish before letting the computation quiesce? > > My gut says: for the things we intend to cancel, most of them will be order-insensitive anyway. Things like: > > - Find the best possible move after thinking for 5 seconds > - Find the first solution that is better than X > - Gather solutions until we have 100 of them > > I believe the key use case for cancelation here will be when we are chewing on potentially infinite streams of events (probably backed by IO) where we want to chew until we're asked to shut down, and want to get as much parallelism as we can cheaply. Which suggests to me the intersection between order-sensitive stream pipelines and cancelable stream pipelines is going to be pretty small indeed. > > Anyone want to add to this model of use cases for cancelation? > From brian.goetz at oracle.com Fri Dec 28 08:38:46 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 11:38:46 -0500 Subject: Cancelation -- use cases In-Reply-To: References: <50DDC6B3.8050805@oracle.com> Message-ID: <50DDCB16.8080509@oracle.com> So does that mean you think that the a5 case is kind of worrying about something we shouldn't worry about? On 12/28/2012 11:33 AM, Joe Bowbeer wrote: > I think of cancellation as a terminal condition that indicates the > results are not needed: > > The user pushes the cancel button: stop and return control to the user. > > What you are describing in your a5 case seems like a notification or > interrupt that may need to be handled cooperatively in order to preserve > the work that has been completed. > > > > > On Fri, Dec 28, 2012 at 8:20 AM, Brian Goetz > wrote: > > I've been working through some alternatives for cancellation support > in infinite streams. Looking to gather some use case background to > help evaluate the alternatives. > > In the serial case, the "gate" approach works fine -- after some > criteria transpires, stop sending elements downstream. The pipeline > flushes the elements it has, and completes early. > > In the parallel unordered case, the gate approach similarly works > fine -- after the cancelation criteria occurs, no new splits are > created, and existing splits dispense no more elements. The > computation similarly quiesces after elements currently being > processed are completed, possibly along with any up-tree merging to > combine results. > > It is the parallel ordered case that is tricky. Supposing we > partition a stream into > (a1,a2,a3), (a4,a5,a6) > > And suppose further we happen to be processing a5 when the bell goes > off. Do we want to wait for all a_i, i<5, to finish before letting > the computation quiesce? > > My gut says: for the things we intend to cancel, most of them will > be order-insensitive anyway. Things like: > > - Find the best possible move after thinking for 5 seconds > - Find the first solution that is better than X > - Gather solutions until we have 100 of them > > I believe the key use case for cancelation here will be when we are > chewing on potentially infinite streams of events (probably backed > by IO) where we want to chew until we're asked to shut down, and > want to get as much parallelism as we can cheaply. Which suggests > to me the intersection between order-sensitive stream pipelines and > cancelable stream pipelines is going to be pretty small indeed. > > Anyone want to add to this model of use cases for cancelation? > > From brian.goetz at oracle.com Fri Dec 28 09:06:38 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 12:06:38 -0500 Subject: Cancelation -- use cases In-Reply-To: References: <50DDC6B3.8050805@oracle.com> Message-ID: <50DDD19E.7010001@oracle.com> Right, or "find me the first N primes / first N solutions to this equation." So, the question is, are these examples outside of what we mean for this 'cancel' facility to be? Would a straight limit(n) (which is order-respecting) do the job in this kind of case, freeing up cancel/while to handle only unordered (temporal-based, quality-based) restrictions? Or are problems like "find me as many contiguous primes as you can in 5 minutes" important enough to try to support through streams? (My gut says no. I think people just want to do things like event processing, where you listen for interesting stuff, until you're told to stop listening, at which point you don't care about the information whizzing past.) On 12/28/2012 11:37 AM, Sam Pullara wrote: > I can see that if you were doing an expensive calculation that is an infinite series of terms and you cancel after some condition you may have to keep all the terms that match before the condition. Maybe something like calculating Pi that stops after the term is less than a certain size would be a reasonable example? That could be done in parallel but would need to gather all the terms up to the cut off. > > Sam > > On Dec 28, 2012, at 11:20 AM, Brian Goetz wrote: > >> I've been working through some alternatives for cancellation support in infinite streams. Looking to gather some use case background to help evaluate the alternatives. >> >> In the serial case, the "gate" approach works fine -- after some criteria transpires, stop sending elements downstream. The pipeline flushes the elements it has, and completes early. >> >> In the parallel unordered case, the gate approach similarly works fine -- after the cancelation criteria occurs, no new splits are created, and existing splits dispense no more elements. The computation similarly quiesces after elements currently being processed are completed, possibly along with any up-tree merging to combine results. >> >> It is the parallel ordered case that is tricky. Supposing we partition a stream into >> (a1,a2,a3), (a4,a5,a6) >> >> And suppose further we happen to be processing a5 when the bell goes off. Do we want to wait for all a_i, i<5, to finish before letting the computation quiesce? >> >> My gut says: for the things we intend to cancel, most of them will be order-insensitive anyway. Things like: >> >> - Find the best possible move after thinking for 5 seconds >> - Find the first solution that is better than X >> - Gather solutions until we have 100 of them >> >> I believe the key use case for cancelation here will be when we are chewing on potentially infinite streams of events (probably backed by IO) where we want to chew until we're asked to shut down, and want to get as much parallelism as we can cheaply. Which suggests to me the intersection between order-sensitive stream pipelines and cancelable stream pipelines is going to be pretty small indeed. >> >> Anyone want to add to this model of use cases for cancelation? >> > From brian.goetz at oracle.com Fri Dec 28 09:55:24 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 12:55:24 -0500 Subject: Primitive streams Message-ID: <50DDDD0C.7030700@oracle.com> The implementation currently has two versions of streams, reference and integer. Let's checkpoint on the primitive specialization strategy, since it does result in a fair amount of code and API bloat (though not as bad as it looks, since many of the currently public abstractions will be made private.) So, let's start with the argument for specialized streams at all. 1. Boxing costs. Doing calculations like "sum of squares" in boxed world is awful: int sumOfWeights = foos.map(Foo::weight).reduce(0, Integer::sum); Here, all the weights will be boxed and unboxed just to add them up. Figure a 10x performance hit for that in the (many) cases where the VM doesn't save us. It is possible to mitigate this somewhat by having fused mapReduce methods, which we tried early on, such as : foos.mapReduce(Foo::getWeight, 0, Integer::sum) Here, at least now all the reduction is happening in the unboxed domain. But the API is now nastier, and while the above is readable, it gets worse in less trivial examples where there are more mapper and reducer lambdas being passed as arguments and its not obvious which is which. Plus the explosion of mapReduce forms: { Obj,int,long,double } x { reduce forms }. Plus the combination of map, reduce, and fused mapReduce leaves users wondering when they should do which. All to work around boxing. This can be further mitigated by specialized fused operations for the most common reductions: sumBy(IntMapper), maxBy(IntMapper), etc. (Price: more overloads, more "when do I use what" confusion.) So, summary so far: we can mitigate boxing costs by cluttering the API with lots of extra methods. (But I don't think that gets us all the way.) 2. Approachability. Telling Java developers that the way to add up a bunch of numbers is to first recognize that integers form a monoid is likely to make them feel like the guy in this cartoon: http://howfuckedismydatabase.com/nosql/ Reduce is wonderful and powerful and going to confuse the crap out of 80+% of Java developers. (This was driven home to me dramatically when I went on the road with my "Lambdas for Java" talk and saw blank faces when I got to "reduce", even from relatively sophisticated audiences. It took a lot of tweaking -- and explaining -- to get it to the point where I didn't get a room full of blank stares.) Simply put: I believe the letters "s-u-m" have to appear prominently in the API. When people are ready, they can learn to see reduce as a generalization of sum(), but not until they're ready. Forcing them to learn reduce() prematurely will hurt adoption. (The sumBy approach above helps here too, again at a cost.) 3. Numerics. Adding up doubles is not as simple as reducing with Double::sum (unless you don't care about accuracy.) Having methods for numeric sums gives us a place to put such intelligence; general reduce does not. 4. "Primitives all the way down". While fused+specialized methods will mitigate many of the above, it only helps at the very end of the chain. It doesn't help things farther up, where we often just want to generate streams of integers and operate on them as integers. Like: intRange(0, 100).map(...).filter(...).sorted().forEach(...) or integers().map(x -> x*x).limit(100).sum() We've currently got a (mostly) complete implementation of integer streams. The actual operation implementations are surprisingly thin, and many can share significant code across stream types (e.g., there's one implementation of MatchOp, with relatively small adapters for Of{Reference,Int,..}). Where most of the code bloat is is in the internal supporting classes (such as the internal Node classes we use to build conc trees) and the spillover into public interfaces (PrimitiveIterator.Of{Int,Long,Double}). Historically we've shied away from giving users useful tools for operating on primitives because we were afraid of the combinatorial explosion: IntList, IntArrayList, DoubleSortedSynchronizedTreeList, etc. While the explosion exists with streams too, we've managed to limit it to something that is tolerable, and can finally give users some useful tools for working with numeric calculations. We've already limited the explosion to just doing int/long/double instead of the full eight. We could pare further to just long/double, since ints can fit easily into longs and most processors are 64-bit at this point anyway. From dl at cs.oswego.edu Fri Dec 28 11:39:42 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 28 Dec 2012 14:39:42 -0500 Subject: Primitive streams In-Reply-To: <50DDDD0C.7030700@oracle.com> References: <50DDDD0C.7030700@oracle.com> Message-ID: <50DDF57E.2060500@cs.oswego.edu> On 12/28/12 12:55, Brian Goetz wrote: > It is possible to mitigate this somewhat by having fused mapReduce methods, > which we tried early on, such as : > > foos.mapReduce(Foo::getWeight, 0, Integer::sum) > > Here, at least now all the reduction is happening in the unboxed domain. But > the API is now nastier, and while the above is readable, it gets worse in less > trivial examples where there are more mapper and reducer lambdas being passed as > arguments and its not obvious which is which. Plus the explosion of mapReduce > forms: { Obj,int,long,double } x { reduce forms }. Plus the combination of map, > reduce, and fused mapReduce leaves users wondering when they should do which. > All to work around boxing. Unsurprisingly (since this is what I rely on in CHM :-), I still think that the stream API need/should not intermix function-composition and aggregate computation. It is sometimes a bit more appealing, but not enough to mangle design for. A small cascade of filter-map-reduce handles all one-pass computations, so long as people use s.filter(and(p1, p2)) rather than s.filter(p1).filter(p2). And so on for map, reduce. -Doug From joe.bowbeer at gmail.com Fri Dec 28 11:41:35 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Fri, 28 Dec 2012 11:41:35 -0800 Subject: Primitive streams In-Reply-To: <50DDDD0C.7030700@oracle.com> References: <50DDDD0C.7030700@oracle.com> Message-ID: I'm appreciating the existence of IntStream and the other primitive streams. IntStream most of all. While many Java programmers are unfamiliar with reduce, there are many FP-aware folks (ruby, groovy, etc) who will want to transfer their favorite expressions to Java. We shouldn't go out of or way to make this transfer difficult. Speaking of favorite expressions, how about char streams? A lot of functional kata are char based. But are there real world examples where lack of CharStream would bite? In any event don't lose IntStream. Joe On Dec 28, 2012 9:55 AM, "Brian Goetz" wrote: > The implementation currently has two versions of streams, reference and > integer. Let's checkpoint on the primitive specialization strategy, since > it does result in a fair amount of code and API bloat (though not as bad as > it looks, since many of the currently public abstractions will be made > private.) > > So, let's start with the argument for specialized streams at all. > > 1. Boxing costs. Doing calculations like "sum of squares" in boxed world > is awful: > > int sumOfWeights = foos.map(Foo::weight).reduce(**0, Integer::sum); > > Here, all the weights will be boxed and unboxed just to add them up. > Figure a 10x performance hit for that in the (many) cases where the VM > doesn't save us. > > It is possible to mitigate this somewhat by having fused mapReduce > methods, which we tried early on, such as : > > foos.mapReduce(Foo::getWeight, 0, Integer::sum) > > Here, at least now all the reduction is happening in the unboxed domain. > But the API is now nastier, and while the above is readable, it gets worse > in less trivial examples where there are more mapper and reducer lambdas > being passed as arguments and its not obvious which is which. Plus the > explosion of mapReduce forms: { Obj,int,long,double } x { reduce forms }. > Plus the combination of map, reduce, and fused mapReduce leaves users > wondering when they should do which. All to work around boxing. > > This can be further mitigated by specialized fused operations for the most > common reductions: sumBy(IntMapper), maxBy(IntMapper), etc. (Price: more > overloads, more "when do I use what" confusion.) > > So, summary so far: we can mitigate boxing costs by cluttering the API > with lots of extra methods. (But I don't think that gets us all the way.) > > > 2. Approachability. Telling Java developers that the way to add up a > bunch of numbers is to first recognize that integers form a monoid is > likely to make them feel like the guy in this cartoon: > > http://howfuckedismydatabase.**com/nosql/ > > Reduce is wonderful and powerful and going to confuse the crap out of 80+% > of Java developers. (This was driven home to me dramatically when I went > on the road with my "Lambdas for Java" talk and saw blank faces when I got > to "reduce", even from relatively sophisticated audiences. It took a lot of > tweaking -- and explaining -- to get it to the point where I didn't get a > room full of blank stares.) > > Simply put: I believe the letters "s-u-m" have to appear prominently in > the API. When people are ready, they can learn to see reduce as a > generalization of sum(), but not until they're ready. Forcing them to > learn reduce() prematurely will hurt adoption. (The sumBy approach above > helps here too, again at a cost.) > > > 3. Numerics. Adding up doubles is not as simple as reducing with > Double::sum (unless you don't care about accuracy.) Having methods for > numeric sums gives us a place to put such intelligence; general reduce does > not. > > > 4. "Primitives all the way down". While fused+specialized methods will > mitigate many of the above, it only helps at the very end of the chain. It > doesn't help things farther up, where we often just want to generate > streams of integers and operate on them as integers. Like: > > intRange(0, 100).map(...).filter(...).**sorted().forEach(...) > > or > > integers().map(x -> x*x).limit(100).sum() > > > > We've currently got a (mostly) complete implementation of integer streams. > The actual operation implementations are surprisingly thin, and many can > share significant code across stream types (e.g., there's one > implementation of MatchOp, with relatively small adapters for > Of{Reference,Int,..}). Where most of the code bloat is is in the internal > supporting classes (such as the internal Node classes we use to build conc > trees) and the spillover into public interfaces (PrimitiveIterator.Of{Int, > **Long,Double}). > > Historically we've shied away from giving users useful tools for operating > on primitives because we were afraid of the combinatorial explosion: > IntList, IntArrayList, DoubleSortedSynchronizedTreeLi**st, etc. While > the explosion exists with streams too, we've managed to limit it to > something that is tolerable, and can finally give users some useful tools > for working with numeric calculations. > > > We've already limited the explosion to just doing int/long/double instead > of the full eight. We could pare further to just long/double, since ints > can fit easily into longs and most processors are 64-bit at this point > anyway. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121228/7bbfea3d/attachment.html From brian.goetz at oracle.com Fri Dec 28 11:50:27 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 14:50:27 -0500 Subject: Primitive streams In-Reply-To: References: <50DDDD0C.7030700@oracle.com> Message-ID: <50DDF803.4070704@oracle.com> > While many Java programmers are unfamiliar with reduce, there are many > FP-aware folks (ruby, groovy, etc) who will want to transfer their > favorite expressions to Java. We shouldn't go out of or way to make this > transfer difficult. No, we're not going to make this difficult. Those already familiar with reduce should be pretty happy. The question is, what should we do to accomodate the other 95% of java developers? Giving them reduce *only* seems like throwing them in the deep end of the pool. > Speaking of favorite expressions, how about char streams? A lot of > functional kata are char based. But are there real world examples where > lack of CharStream would bite? In any event don't lose IntStream. Currently we expose String.chars() String.codePoints() as IntStream. If you want to deal with them as chars, you can downcast them to chars easily enough. Doesn't seem like an important enough use case to have a whole 'nother set of streams. (Same with Short, Byte, Float). From joe.darcy at oracle.com Fri Dec 28 12:02:15 2012 From: joe.darcy at oracle.com (Joe Darcy) Date: Fri, 28 Dec 2012 12:02:15 -0800 Subject: Request for review: proposal for @FunctionalInterface checking Message-ID: <50DDFAC7.4030206@oracle.com> Hello, We've had some discussions internally at Oracle about adding a FunctionalInterface annotation type to the platform and we'd now like to get the expert group's evaluation and feedback on the proposal. Just as the java.lang.Override annotation type allows compile-time checking of programmer intent to override a method, the goal for the FunctionalInterface annotation type is to enable analogous compile-time checking of whether or not an interface type is functional. Draft specification: package java.lang; /** Indicates that an interface type declaration is intended to be a functional interface as defined by the Java Language Specification. Conceptually, a functional interface has exactly one abstract method. Since default methods are not abstract, any default methods declared in an interface do not contribute to its abstract method count. If an interface declares a method overriding one of the public methods of java.lang.Object, that also does not count toward the abstract method count. Note that instances of functional interfaces can be created with lambda expressions, method references, or constructor references. If a type is annotated with this annotation type, compilers are required to generate an error message unless:
  • The type is an interface type and not an annotation type, enum, or class.
  • The annotated type satisfies the requirements of a functional interface.
@jls 9.8 Functional Interfaces @jls 9.4.3 Interface Method Body @jls 9.6.3.8 FunctionalInterface [Interfaces in the java.lang package get a corresponding JLS section] @since 1.8 */ @Documented @Retention(RUNTIME) @Target(TYPE) @interface FunctionalInterface {} // Marker annotation Annotations on interfaces are *not* inherited, which is the proper semantics in this case. A subinterface of a functional interface can add methods and thus not itself be functional. There are some subtleties to the definition of a functional interface, but I thought that including those by reference to the JLS was sufficient and putting in all the details would be more likely to confuse than clarify. Please send comments by January 4, 2013; thanks, -Joe From brian.goetz at oracle.com Fri Dec 28 12:12:48 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 15:12:48 -0500 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: <50DDFAC7.4030206@oracle.com> References: <50DDFAC7.4030206@oracle.com> Message-ID: <50DDFD40.7010303@oracle.com> Note that this proposal does NOT intend to change the rule that functional interfaces are recognized structurally; single-method interfaces will still be recognized as SAMs. This is more like @Override, where the user can optionally capture design intent and the compiler can warn when said design intent is violated. I support this proposal. On 12/28/2012 3:02 PM, Joe Darcy wrote: > Hello, > > We've had some discussions internally at Oracle about adding a > FunctionalInterface annotation type to the platform and we'd now like to > get the expert group's evaluation and feedback on the proposal. > > Just as the java.lang.Override annotation type allows compile-time > checking of programmer intent to override a method, the goal for the > FunctionalInterface annotation type is to enable analogous compile-time > checking of whether or not an interface type is functional. Draft > specification: > > package java.lang; > > /** > Indicates that an interface type declaration is intended to be a > functional interface as defined by the Java Language > Specification. Conceptually, a functional interface has exactly one > abstract method. Since default methods are not abstract, any default > methods declared in an interface do not contribute to its abstract > method count. If an interface declares a method overriding one of the > public methods of java.lang.Object, that also does not count > toward the abstract method count. > > Note that instances of functional interfaces can be created with lambda > expressions, method references, or constructor references. > > If a type is annotated with this annotation type, compilers are required > to generate an error message unless: > >
    >
  • The type is an interface type and not an annotation type, enum, or > class. >
  • The annotated type satisfies the requirements of a functional > interface. >
> > @jls 9.8 Functional Interfaces > @jls 9.4.3 Interface Method Body > @jls 9.6.3.8 FunctionalInterface [Interfaces in the java.lang package > get a corresponding JLS section] > @since 1.8 > */ > @Documented > @Retention(RUNTIME) > @Target(TYPE) > @interface FunctionalInterface {} // Marker annotation > > Annotations on interfaces are *not* inherited, which is the proper > semantics in this case. A subinterface of a functional interface can > add methods and thus not itself be functional. There are some > subtleties to the definition of a functional interface, but I thought > that including those by reference to the JLS was sufficient and putting > in all the details would be more likely to confuse than clarify. > > Please send comments by January 4, 2013; thanks, > > -Joe > From dl at cs.oswego.edu Fri Dec 28 12:14:04 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 28 Dec 2012 15:14:04 -0500 Subject: Tabulators -- a catalog In-Reply-To: <50DD02AE.3040803@oracle.com> References: <50DD02AE.3040803@oracle.com> Message-ID: <50DDFD8C.7090509@cs.oswego.edu> On 12/27/12 21:23, Brian Goetz wrote: > Here's a catalog of the currently implemented Tabulators. Sorry for the protracted grumpiness about Reducers/Tabulators, but even if I ignore my other reservations, MutableReduce remains the which-one-doesn't-belong here-winner. Between all the concerns about making racy/concurrent mutability too easy to do by mistake, and the fact that the implementations should be at least as easy for users to code directly using seq/par forEach loops, I don't see the story behind this? -Doug From sam at sampullara.com Fri Dec 28 12:16:01 2012 From: sam at sampullara.com (Sam Pullara) Date: Fri, 28 Dec 2012 15:16:01 -0500 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: <50DDFD40.7010303@oracle.com> References: <50DDFAC7.4030206@oracle.com> <50DDFD40.7010303@oracle.com> Message-ID: Is the intent that an interface that is not functional but marked as such won't compile? Sam On Dec 28, 2012, at 3:12 PM, Brian Goetz wrote: > Note that this proposal does NOT intend to change the rule that functional interfaces are recognized structurally; single-method interfaces will still be recognized as SAMs. This is more like @Override, where the user can optionally capture design intent and the compiler can warn when said design intent is violated. > > I support this proposal. > > On 12/28/2012 3:02 PM, Joe Darcy wrote: >> Hello, >> >> We've had some discussions internally at Oracle about adding a >> FunctionalInterface annotation type to the platform and we'd now like to >> get the expert group's evaluation and feedback on the proposal. >> >> Just as the java.lang.Override annotation type allows compile-time >> checking of programmer intent to override a method, the goal for the >> FunctionalInterface annotation type is to enable analogous compile-time >> checking of whether or not an interface type is functional. Draft >> specification: >> >> package java.lang; >> >> /** >> Indicates that an interface type declaration is intended to be a >> functional interface as defined by the Java Language >> Specification. Conceptually, a functional interface has exactly one >> abstract method. Since default methods are not abstract, any default >> methods declared in an interface do not contribute to its abstract >> method count. If an interface declares a method overriding one of the >> public methods of java.lang.Object, that also does not count >> toward the abstract method count. >> >> Note that instances of functional interfaces can be created with lambda >> expressions, method references, or constructor references. >> >> If a type is annotated with this annotation type, compilers are required >> to generate an error message unless: >> >>
    >>
  • The type is an interface type and not an annotation type, enum, or >> class. >>
  • The annotated type satisfies the requirements of a functional >> interface. >>
>> >> @jls 9.8 Functional Interfaces >> @jls 9.4.3 Interface Method Body >> @jls 9.6.3.8 FunctionalInterface [Interfaces in the java.lang package >> get a corresponding JLS section] >> @since 1.8 >> */ >> @Documented >> @Retention(RUNTIME) >> @Target(TYPE) >> @interface FunctionalInterface {} // Marker annotation >> >> Annotations on interfaces are *not* inherited, which is the proper >> semantics in this case. A subinterface of a functional interface can >> add methods and thus not itself be functional. There are some >> subtleties to the definition of a functional interface, but I thought >> that including those by reference to the JLS was sufficient and putting >> in all the details would be more likely to confuse than clarify. >> >> Please send comments by January 4, 2013; thanks, >> >> -Joe >> From brian.goetz at oracle.com Fri Dec 28 12:17:54 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 15:17:54 -0500 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: References: <50DDFAC7.4030206@oracle.com> <50DDFD40.7010303@oracle.com> Message-ID: <50DDFE72.9070401@oracle.com> Yes. If you mark an interface as functional, and it is not, the compiler will warn/error. This prevents action-at-a-distance errors where you have a SAM, other code depends on its SAM-ness, and someone later decides to add another abstract method (or a method to one of its supertypes). It also provide extra documentation value. Basically, just like @Override. On 12/28/2012 3:16 PM, Sam Pullara wrote: > Is the intent that an interface that is not functional but marked as such won't compile? > > Sam > > On Dec 28, 2012, at 3:12 PM, Brian Goetz wrote: > >> Note that this proposal does NOT intend to change the rule that functional interfaces are recognized structurally; single-method interfaces will still be recognized as SAMs. This is more like @Override, where the user can optionally capture design intent and the compiler can warn when said design intent is violated. >> >> I support this proposal. >> >> On 12/28/2012 3:02 PM, Joe Darcy wrote: >>> Hello, >>> >>> We've had some discussions internally at Oracle about adding a >>> FunctionalInterface annotation type to the platform and we'd now like to >>> get the expert group's evaluation and feedback on the proposal. >>> >>> Just as the java.lang.Override annotation type allows compile-time >>> checking of programmer intent to override a method, the goal for the >>> FunctionalInterface annotation type is to enable analogous compile-time >>> checking of whether or not an interface type is functional. Draft >>> specification: >>> >>> package java.lang; >>> >>> /** >>> Indicates that an interface type declaration is intended to be a >>> functional interface as defined by the Java Language >>> Specification. Conceptually, a functional interface has exactly one >>> abstract method. Since default methods are not abstract, any default >>> methods declared in an interface do not contribute to its abstract >>> method count. If an interface declares a method overriding one of the >>> public methods of java.lang.Object, that also does not count >>> toward the abstract method count. >>> >>> Note that instances of functional interfaces can be created with lambda >>> expressions, method references, or constructor references. >>> >>> If a type is annotated with this annotation type, compilers are required >>> to generate an error message unless: >>> >>>
    >>>
  • The type is an interface type and not an annotation type, enum, or >>> class. >>>
  • The annotated type satisfies the requirements of a functional >>> interface. >>>
>>> >>> @jls 9.8 Functional Interfaces >>> @jls 9.4.3 Interface Method Body >>> @jls 9.6.3.8 FunctionalInterface [Interfaces in the java.lang package >>> get a corresponding JLS section] >>> @since 1.8 >>> */ >>> @Documented >>> @Retention(RUNTIME) >>> @Target(TYPE) >>> @interface FunctionalInterface {} // Marker annotation >>> >>> Annotations on interfaces are *not* inherited, which is the proper >>> semantics in this case. A subinterface of a functional interface can >>> add methods and thus not itself be functional. There are some >>> subtleties to the definition of a functional interface, but I thought >>> that including those by reference to the JLS was sufficient and putting >>> in all the details would be more likely to confuse than clarify. >>> >>> Please send comments by January 4, 2013; thanks, >>> >>> -Joe >>> > From sam at sampullara.com Fri Dec 28 12:28:52 2012 From: sam at sampullara.com (Sam Pullara) Date: Fri, 28 Dec 2012 15:28:52 -0500 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: <50DDFAC7.4030206@oracle.com> References: <50DDFAC7.4030206@oracle.com> Message-ID: <7523190D-4DB3-48DA-B71A-A9EB6233D8E6@sampullara.com> On Dec 28, 2012, at 3:02 PM, Joe Darcy wrote: > has exactly one abstract method. Since default methods are not abstract, any default methods declared in an interface do not contribute to its abstract method count. If an interface declares a method overriding one of the public methods of java.lang.Object, that also does not count toward the abstract method count. This is pretty murky. This works: interface Foo { @Override boolean equals(Object other); } but if you try this interface Foo { @Override default boolean equals(Object other) { return false; } } it does give an error that says I can't override: java: default method equals in interface spullara.Foo overrides a member of java.lang.Object Seems like "override" is the wrong word to use and will likely be confusing since we are explicitly disallowing the second one. Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121228/2ae02b60/attachment.html From brian.goetz at oracle.com Fri Dec 28 12:29:26 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 28 Dec 2012 15:29:26 -0500 Subject: Tabulators -- a catalog In-Reply-To: <50DDFD8C.7090509@cs.oswego.edu> References: <50DD02AE.3040803@oracle.com> <50DDFD8C.7090509@cs.oswego.edu> Message-ID: <50DE0126.1020603@oracle.com> The framework ensures that instances of the mutable objects are not shared while the operation is in place, so this collapses to the same old non-interference requirement as we place on mutable sources like ArrayList. There are lots of things in Java where the only efficient way to do something is mutation. String concat is an example. We could do a functional reduce with String::concat, but it would suck: strings.reduce("", String::concat) // O(n^2) ! We can do a mutable reduce, *safely*, with: strings.reduce(StringBuilder::new, StringBuilder::add, StringBuilder::add) and get much nicer behavior. Encouraging users to use parallel forEach loops is far more likely to result in races/inefficiency than mutable reduction! And in this particular case, string concatenation must respect order (String.concat is associative but not commutative) making parallel forEach the wrong tool. This is naturally a reduction. On 12/28/2012 3:14 PM, Doug Lea wrote: > On 12/27/12 21:23, Brian Goetz wrote: >> Here's a catalog of the currently implemented Tabulators. > > Sorry for the protracted grumpiness about Reducers/Tabulators, > but even if I ignore my other reservations, MutableReduce > remains the which-one-doesn't-belong here-winner. Between all > the concerns about making racy/concurrent mutability too > easy to do by mistake, and the fact that the implementations > should be at least as easy for users to code directly using seq/par > forEach loops, I don't see the story behind this? > > -Doug > From kevinb at google.com Fri Dec 28 12:30:31 2012 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 28 Dec 2012 12:30:31 -0800 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: <50DDFE72.9070401@oracle.com> References: <50DDFAC7.4030206@oracle.com> <50DDFD40.7010303@oracle.com> <50DDFE72.9070401@oracle.com> Message-ID: I see one important difference from @Override. @Override catches errors that might otherwise go completely uncaught. With a type intended to be a functional interface, the moment anyone ever tries to use it as such, there's your compilation error. So I don't see what sets @FunctionalInterface apart from the whole host of static-analysis annotations that we've relegated to the now-abandoned JSR 305. On Fri, Dec 28, 2012 at 12:17 PM, Brian Goetz wrote: > Yes. If you mark an interface as functional, and it is not, the compiler > will warn/error. This prevents action-at-a-distance errors where you have > a SAM, other code depends on its SAM-ness, and someone later decides to add > another abstract method (or a method to one of its supertypes). It also > provide extra documentation value. > > Basically, just like @Override. > > > On 12/28/2012 3:16 PM, Sam Pullara wrote: > >> Is the intent that an interface that is not functional but marked as such >> won't compile? >> >> Sam >> >> On Dec 28, 2012, at 3:12 PM, Brian Goetz wrote: >> >> Note that this proposal does NOT intend to change the rule that >>> functional interfaces are recognized structurally; single-method interfaces >>> will still be recognized as SAMs. This is more like @Override, where the >>> user can optionally capture design intent and the compiler can warn when >>> said design intent is violated. >>> >>> I support this proposal. >>> >>> On 12/28/2012 3:02 PM, Joe Darcy wrote: >>> >>>> Hello, >>>> >>>> We've had some discussions internally at Oracle about adding a >>>> FunctionalInterface annotation type to the platform and we'd now like to >>>> get the expert group's evaluation and feedback on the proposal. >>>> >>>> Just as the java.lang.Override annotation type allows compile-time >>>> checking of programmer intent to override a method, the goal for the >>>> FunctionalInterface annotation type is to enable analogous compile-time >>>> checking of whether or not an interface type is functional. Draft >>>> specification: >>>> >>>> package java.lang; >>>> >>>> /** >>>> Indicates that an interface type declaration is intended to be a >>>> functional interface as defined by the Java Language >>>> Specification. Conceptually, a functional interface has exactly one >>>> abstract method. Since default methods are not abstract, any default >>>> methods declared in an interface do not contribute to its abstract >>>> method count. If an interface declares a method overriding one of the >>>> public methods of java.lang.Object, that also does not count >>>> toward the abstract method count. >>>> >>>> Note that instances of functional interfaces can be created with lambda >>>> expressions, method references, or constructor references. >>>> >>>> If a type is annotated with this annotation type, compilers are required >>>> to generate an error message unless: >>>> >>>>
    >>>>
  • The type is an interface type and not an annotation type, enum, or >>>> class. >>>>
  • The annotated type satisfies the requirements of a functional >>>> interface. >>>>
>>>> >>>> @jls 9.8 Functional Interfaces >>>> @jls 9.4.3 Interface Method Body >>>> @jls 9.6.3.8 FunctionalInterface [Interfaces in the java.lang package >>>> get a corresponding JLS section] >>>> @since 1.8 >>>> */ >>>> @Documented >>>> @Retention(RUNTIME) >>>> @Target(TYPE) >>>> @interface FunctionalInterface {} // Marker annotation >>>> >>>> Annotations on interfaces are *not* inherited, which is the proper >>>> semantics in this case. A subinterface of a functional interface can >>>> add methods and thus not itself be functional. There are some >>>> subtleties to the definition of a functional interface, but I thought >>>> that including those by reference to the JLS was sufficient and putting >>>> in all the details would be more likely to confuse than clarify. >>>> >>>> Please send comments by January 4, 2013; thanks, >>>> >>>> -Joe >>>> >>>> >> -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121228/11c52d0c/attachment.html From kevinb at google.com Fri Dec 28 12:36:12 2012 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 28 Dec 2012 12:36:12 -0800 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: References: <50DDFAC7.4030206@oracle.com> <50DDFD40.7010303@oracle.com> <50DDFE72.9070401@oracle.com> Message-ID: (And yes, I understand the bits about action-at-a-distance and documentation value. @Override has those too but *also* so much more value; this doesn't. That still puts it in the same category as most of JSR 305.) Btw what are the costs of adding this annotation? - One more thing to learn, making lamdas seem some tiny percentage more complicated - Great potential for user misconception (that it's required), but little harm should come of that - Companies like mine have one more thing to add to our internal style guides (can we use it, should we use it, must we use it?) The only costs I see are pretty small. However, the benefit also looks extremely small to me, and I continue to think it just doesn't quite seem to belong. On Fri, Dec 28, 2012 at 12:30 PM, Kevin Bourrillion wrote: > I see one important difference from @Override. @Override catches errors > that might otherwise go completely uncaught. With a type intended to be a > functional interface, the moment anyone ever tries to use it as such, > there's your compilation error. > > So I don't see what sets @FunctionalInterface apart from the whole host of > static-analysis annotations that we've relegated to the now-abandoned JSR > 305. > > > On Fri, Dec 28, 2012 at 12:17 PM, Brian Goetz wrote: > >> Yes. If you mark an interface as functional, and it is not, the compiler >> will warn/error. This prevents action-at-a-distance errors where you have >> a SAM, other code depends on its SAM-ness, and someone later decides to add >> another abstract method (or a method to one of its supertypes). It also >> provide extra documentation value. >> >> Basically, just like @Override. >> >> >> On 12/28/2012 3:16 PM, Sam Pullara wrote: >> >>> Is the intent that an interface that is not functional but marked as >>> such won't compile? >>> >>> Sam >>> >>> On Dec 28, 2012, at 3:12 PM, Brian Goetz wrote: >>> >>> Note that this proposal does NOT intend to change the rule that >>>> functional interfaces are recognized structurally; single-method interfaces >>>> will still be recognized as SAMs. This is more like @Override, where the >>>> user can optionally capture design intent and the compiler can warn when >>>> said design intent is violated. >>>> >>>> I support this proposal. >>>> >>>> On 12/28/2012 3:02 PM, Joe Darcy wrote: >>>> >>>>> Hello, >>>>> >>>>> We've had some discussions internally at Oracle about adding a >>>>> FunctionalInterface annotation type to the platform and we'd now like >>>>> to >>>>> get the expert group's evaluation and feedback on the proposal. >>>>> >>>>> Just as the java.lang.Override annotation type allows compile-time >>>>> checking of programmer intent to override a method, the goal for the >>>>> FunctionalInterface annotation type is to enable analogous compile-time >>>>> checking of whether or not an interface type is functional. Draft >>>>> specification: >>>>> >>>>> package java.lang; >>>>> >>>>> /** >>>>> Indicates that an interface type declaration is intended to be a >>>>> functional interface as defined by the Java Language >>>>> Specification. Conceptually, a functional interface has exactly one >>>>> abstract method. Since default methods are not abstract, any default >>>>> methods declared in an interface do not contribute to its abstract >>>>> method count. If an interface declares a method overriding one of the >>>>> public methods of java.lang.Object, that also does not count >>>>> toward the abstract method count. >>>>> >>>>> Note that instances of functional interfaces can be created with lambda >>>>> expressions, method references, or constructor references. >>>>> >>>>> If a type is annotated with this annotation type, compilers are >>>>> required >>>>> to generate an error message unless: >>>>> >>>>>
    >>>>>
  • The type is an interface type and not an annotation type, enum, or >>>>> class. >>>>>
  • The annotated type satisfies the requirements of a functional >>>>> interface. >>>>>
>>>>> >>>>> @jls 9.8 Functional Interfaces >>>>> @jls 9.4.3 Interface Method Body >>>>> @jls 9.6.3.8 FunctionalInterface [Interfaces in the java.lang package >>>>> get a corresponding JLS section] >>>>> @since 1.8 >>>>> */ >>>>> @Documented >>>>> @Retention(RUNTIME) >>>>> @Target(TYPE) >>>>> @interface FunctionalInterface {} // Marker annotation >>>>> >>>>> Annotations on interfaces are *not* inherited, which is the proper >>>>> semantics in this case. A subinterface of a functional interface can >>>>> add methods and thus not itself be functional. There are some >>>>> subtleties to the definition of a functional interface, but I thought >>>>> that including those by reference to the JLS was sufficient and putting >>>>> in all the details would be more likely to confuse than clarify. >>>>> >>>>> Please send comments by January 4, 2013; thanks, >>>>> >>>>> -Joe >>>>> >>>>> >>> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121228/12433c1a/attachment.html From dl at cs.oswego.edu Fri Dec 28 13:32:43 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 28 Dec 2012 16:32:43 -0500 Subject: Tabulators -- a catalog In-Reply-To: <50DE0126.1020603@oracle.com> References: <50DD02AE.3040803@oracle.com> <50DDFD8C.7090509@cs.oswego.edu> <50DE0126.1020603@oracle.com> Message-ID: <50DE0FFB.3000803@cs.oswego.edu> On 12/28/12 15:29, Brian Goetz wrote: > The framework ensures that instances of the mutable objects are not shared while > the operation is in place, Only if the opaque function handing you one does so. This would be too scary for me to use. But then I'm not the target audience so maybe I shouldn't dwell on such things. -Doug From joe.darcy at oracle.com Fri Dec 28 13:38:57 2012 From: joe.darcy at oracle.com (Joe Darcy) Date: Fri, 28 Dec 2012 13:38:57 -0800 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: <7523190D-4DB3-48DA-B71A-A9EB6233D8E6@sampullara.com> References: <50DDFAC7.4030206@oracle.com> <7523190D-4DB3-48DA-B71A-A9EB6233D8E6@sampullara.com> Message-ID: <50DE1171.9000909@oracle.com> Hi Sam, On 12/28/2012 12:28 PM, Sam Pullara wrote: > On Dec 28, 2012, at 3:02 PM, Joe Darcy > wrote: >> has exactly one abstract method. Since default methods are not >> abstract, any default methods declared in an interface do not >> contribute to its abstract method count. If an interface declares a >> method overriding one of the public methods of java.lang.Object, that >> also does not count toward the abstract method count. > > This is pretty murky. This works: > > interface Foo { > @Override > boolean equals(Object other); > } > > but if you try this > > interface Foo { > @Override > default boolean equals(Object other) { > return false; > } > } > > it does give an error that says I can't override: > > *java: default method equals in interface spullara.Foo overrides a > member of java.lang.Object* > > Seems like "override" is the wrong word to use and will likely be > confusing since we are explicitly disallowing the second one. > > Yes, I was contemplating whether "override" was the best phrasing to use in the wording above because of this sort of wrinkle. Instead "*abstract* method overriding one of the ..." might help distinguish this particular case. -Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121228/0cc2e100/attachment.html From joe.darcy at oracle.com Fri Dec 28 14:14:51 2012 From: joe.darcy at oracle.com (Joe Darcy) Date: Fri, 28 Dec 2012 14:14:51 -0800 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: References: <50DDFAC7.4030206@oracle.com> <50DDFD40.7010303@oracle.com> <50DDFE72.9070401@oracle.com> Message-ID: <50DE19DB.5020405@oracle.com> On 12/28/2012 12:30 PM, Kevin Bourrillion wrote: > I see one important difference from @Override. @Override catches > errors that might otherwise go completely uncaught. With a type > intended to be a functional interface, the moment anyone ever tries to > use it as such, there's your compilation error. The formal definition of a functional interface from the draft JLS text is: > More precisely, for interface I, let M be the set of abstract methods > that are members of I but that do not have the same signature as any > public instance method of the class Object. Then I is a functional > interface if there exists a method m in M for which the following > conditions hold: > > The signature of m is a subsignature (8.4.2) of every method's > signature in M. > m is return-type-substitutable (8.4.5) for every method in M. which defines both "Z" types below as functional interfaces: interface X { Iterable m(Iterable arg); } interface Y { Iterable m(Iterable arg); } interface Z extends X, Y {} // Functional: Y.m is a subsignature & return-type-substitutable interface X { T execute(Action a); } interface Y { S execute(Action a); } interface Exec extends X, Y {} // Functional: signatures are "the same" I think especially the first of these examples is not immediately obvious and would benefit from an annotation to indicate the intention of the type, if it has such an intention. > > So I don't see what sets @FunctionalInterface apart from the whole > host of static-analysis annotations that we've relegated to the > now-abandoned JSR 305. One difference is that lambdas are a new-to-Java language construct being retrofitting over existing usage patterns and the proposed annotation type is directly related to the new language feature, unlike the JSR 305 annotation types. Cheers, -Joe > > > On Fri, Dec 28, 2012 at 12:17 PM, Brian Goetz > wrote: > > Yes. If you mark an interface as functional, and it is not, the > compiler will warn/error. This prevents action-at-a-distance > errors where you have a SAM, other code depends on its SAM-ness, > and someone later decides to add another abstract method (or a > method to one of its supertypes). It also provide extra > documentation value. > > Basically, just like @Override. > > > On 12/28/2012 3:16 PM, Sam Pullara wrote: > > Is the intent that an interface that is not functional but > marked as such won't compile? > > Sam > > On Dec 28, 2012, at 3:12 PM, Brian Goetz > > wrote: > > Note that this proposal does NOT intend to change the rule > that functional interfaces are recognized structurally; > single-method interfaces will still be recognized as SAMs. > This is more like @Override, where the user can > optionally capture design intent and the compiler can warn > when said design intent is violated. > > I support this proposal. > > On 12/28/2012 3:02 PM, Joe Darcy wrote: > > Hello, > > We've had some discussions internally at Oracle about > adding a > FunctionalInterface annotation type to the platform > and we'd now like to > get the expert group's evaluation and feedback on the > proposal. > > Just as the java.lang.Override annotation type allows > compile-time > checking of programmer intent to override a method, > the goal for the > FunctionalInterface annotation type is to enable > analogous compile-time > checking of whether or not an interface type is > functional. Draft > specification: > > package java.lang; > > /** > Indicates that an interface type declaration is > intended to be a > functional interface as defined by the Java > Language > Specification. Conceptually, a functional interface > has exactly one > abstract method. Since default methods are not > abstract, any default > methods declared in an interface do not contribute to > its abstract > method count. If an interface declares a method > overriding one of the > public methods of java.lang.Object, that also does > not count > toward the abstract method count. > > Note that instances of functional interfaces can be > created with lambda > expressions, method references, or constructor references. > > If a type is annotated with this annotation type, > compilers are required > to generate an error message unless: > >
    >
  • The type is an interface type and not an > annotation type, enum, or > class. >
  • The annotated type satisfies the requirements of > a functional > interface. >
> > @jls 9.8 Functional Interfaces > @jls 9.4.3 Interface Method Body > @jls 9.6.3.8 FunctionalInterface [Interfaces in the > java.lang package > get a corresponding JLS section] > @since 1.8 > */ > @Documented > @Retention(RUNTIME) > @Target(TYPE) > @interface FunctionalInterface {} // Marker annotation > > Annotations on interfaces are *not* inherited, which > is the proper > semantics in this case. A subinterface of a > functional interface can > add methods and thus not itself be functional. There > are some > subtleties to the definition of a functional > interface, but I thought > that including those by reference to the JLS was > sufficient and putting > in all the details would be more likely to confuse > than clarify. > > Please send comments by January 4, 2013; thanks, > > -Joe > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121228/4c5d8e00/attachment.html From Donald.Raab at gs.com Fri Dec 28 15:01:11 2012 From: Donald.Raab at gs.com (Raab, Donald) Date: Fri, 28 Dec 2012 18:01:11 -0500 Subject: Primitive streams In-Reply-To: <50DDF803.4070704@oracle.com> References: <50DDDD0C.7030700@oracle.com> <50DDF803.4070704@oracle.com> Message-ID: <6712820CB52CFB4D842561213A77C05404C0C17E90@GSCMAMP09EX.firmwide.corp.gs.com> We thought it would be worthwhile for GSC. Here's an example usage from our Kata in the Order class, getValue() method: return this.lineItems.sumOfDouble(LineItem::getValue); vs. return this.lineItems.stream().map(LineItem::getValue).reduce(0.0, (x, y) -> x + y); The second one does require more awareness and understanding. I understand the second one, but I would use the first one if I needed a sum and it was available on the API. I assume sum would look like this on streams if/when you add the Double version of stream: return this.lineItems.stream().map(LineItem::getValue).sum(); and the short-hand would look as follows: return this.lineItems.stream().sumBy(LineItem::getValue); The short-hand version might be easier to discover in the IDE. Eventually, Java developers will wade into the deeper end of the pool with map and reduce. > > While many Java programmers are unfamiliar with reduce, there are many > > FP-aware folks (ruby, groovy, etc) who will want to transfer their > > favorite expressions to Java. We shouldn't go out of or way to make > > this transfer difficult. > > No, we're not going to make this difficult. Those already familiar with > reduce should be pretty happy. > > The question is, what should we do to accomodate the other 95% of java > developers? Giving them reduce *only* seems like throwing them in the deep > end of the pool. > From joe.bowbeer at gmail.com Fri Dec 28 18:01:06 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Fri, 28 Dec 2012 18:01:06 -0800 Subject: Primitive streams In-Reply-To: <50DDF803.4070704@oracle.com> References: <50DDDD0C.7030700@oracle.com> <50DDF803.4070704@oracle.com> Message-ID: ByteStream seems fundamental. Wouldn't it be worthwhile to support this? On Fri, Dec 28, 2012 at 11:50 AM, Brian Goetz wrote: > While many Java programmers are unfamiliar with reduce, there are many >> FP-aware folks (ruby, groovy, etc) who will want to transfer their >> favorite expressions to Java. We shouldn't go out of or way to make this >> transfer difficult. >> > > No, we're not going to make this difficult. Those already familiar with > reduce should be pretty happy. > > The question is, what should we do to accomodate the other 95% of java > developers? Giving them reduce *only* seems like throwing them in the deep > end of the pool. > > > Speaking of favorite expressions, how about char streams? A lot of >> functional kata are char based. But are there real world examples where >> lack of CharStream would bite? In any event don't lose IntStream. >> > > Currently we expose > String.chars() > String.codePoints() > > as IntStream. If you want to deal with them as chars, you can downcast > them to chars easily enough. Doesn't seem like an important enough use > case to have a whole 'nother set of streams. (Same with Short, Byte, > Float). > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121228/7214d3b8/attachment.html From dl at cs.oswego.edu Sat Dec 29 05:06:17 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 29 Dec 2012 08:06:17 -0500 Subject: overload patterns/anti-patterns Message-ID: <50DEEAC9.1090301@cs.oswego.edu> As Remy pointed out when I posted first sketch of CompletableFuture, overloading methods on the basis of function type will often force people to add explicit casts on lambdas. Which is not something we'd like to require, especially because the cases are fragile and depend on bits of punctuation. For example "x -> foo()" can be either Block or Function, but "x -> { foo(); }" can only be Block. I had since reworked CompletableFuture to avoid the most common ambiguous shapes in overloads. But a couple remain, and people are already starting to encounter them. My current sense is that, no matter how much lambda-overload matching is tweaked, this will remain a common API design gotcha, and the best advice is to never overload solely on function type. So I'm about to rename some CompletableFuture methods: CF.then(fn) => CF.thenApply(fn) CF.then(runnable) => CF.thenRun(runnable) in turn allowing re-introduction of the doubly-problematic Block form: CF.thenAccept(block). Similarly for others, including CF.async(runnable) => CF.runAsync(runnable) I figure that if we are stuck with different method names for different functional forms, then I might as well exploit it here to use in overload stems. Supplier doesn't fit well in this scheme though. The method name "get()" is a terrible overload-stem, so instead: CF.async(Supplier) => CF.supplyAsync(Supplier) (BTW Supplier is lambda-confusable with Runnable). But this also is an argument for changing the method name for Supplier to be "supply". -Doug From brian.goetz at oracle.com Sat Dec 29 06:59:49 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 29 Dec 2012 09:59:49 -0500 Subject: overload patterns/anti-patterns In-Reply-To: <50DEEAC9.1090301@cs.oswego.edu> References: <50DEEAC9.1090301@cs.oswego.edu> Message-ID: <50DF0565.6060602@oracle.com> > My current sense is that, no matter how much lambda-overload > matching is tweaked, this will remain a common API design > gotcha, and the best advice is to never overload solely > on function type. ... where the function types have the same arity. There's no hazard to overloading foo(Function) and foo(BiPredicate) Similarly, it is a key goal to make overloading on specialized return type safe, because we want to support constructions like: Stream map(Function) and IntStream map(IntFunction) From dl at cs.oswego.edu Sat Dec 29 07:10:32 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 29 Dec 2012 10:10:32 -0500 Subject: overload patterns/anti-patterns In-Reply-To: <50DF0565.6060602@oracle.com> References: <50DEEAC9.1090301@cs.oswego.edu> <50DF0565.6060602@oracle.com> Message-ID: <50DF07E8.4090102@cs.oswego.edu> On 12/29/12 09:59, Brian Goetz wrote: >> My current sense is that, no matter how much lambda-overload >> matching is tweaked, this will remain a common API design >> gotcha, and the best advice is to never overload solely >> on function type. > > ... where the function types have the same arity. .. of arguments. The "x -> foo()" vs "x -> { foo(); }" issue (and variants) ambiguate cases of no result vs a result. > Similarly, it is a key goal to make overloading on specialized return type safe, Modulo the above? Or is this an important enough goal to require that void actions use braces? -Doug From brian.goetz at oracle.com Sat Dec 29 08:19:14 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 29 Dec 2012 11:19:14 -0500 Subject: overload patterns/anti-patterns In-Reply-To: <50DF07E8.4090102@cs.oswego.edu> References: <50DEEAC9.1090301@cs.oswego.edu> <50DF0565.6060602@oracle.com> <50DF07E8.4090102@cs.oswego.edu> Message-ID: <50DF1802.9000302@oracle.com> >> ... where the function types have the same arity. > > .. of arguments. The "x -> foo()" vs "x -> { foo(); }" > issue (and variants) ambiguate cases of no result vs a result. Right. We discussed this at the EG meeting and the consensus was to coerce non-void to void if the only overload candidate was void (to allow cases like forEach(list::add), but to declare ambiguity if there is both a void and a non-void option. >> Similarly, it is a key goal to make overloading on specialized return >> type safe, > > Modulo the above? Or is this an important enough goal to require > that void actions use braces? We initially tried that -- and people *hated* it. Having to say forEach(e -> { list.add(e); }) instead of forEach(e -> list.add(e)) really, really bothered people. The "overload on specialized return type" is only about boxed vs unboxed, not about void vs value. If there are conflicting overloads of void vs value for the same arity, you'll be disambiguating. Similarly, if there are conflicting overloads on parameter types, you'll be disambiguating. From dl at cs.oswego.edu Sat Dec 29 08:29:13 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 29 Dec 2012 11:29:13 -0500 Subject: overload patterns/anti-patterns In-Reply-To: <50DF1802.9000302@oracle.com> References: <50DEEAC9.1090301@cs.oswego.edu> <50DF0565.6060602@oracle.com> <50DF07E8.4090102@cs.oswego.edu> <50DF1802.9000302@oracle.com> Message-ID: <50DF1A59.4000500@cs.oswego.edu> On 12/29/12 11:19, Brian Goetz wrote: >>> Similarly, it is a key goal to make overloading on specialized return >>> type safe, >> >> Modulo the above? Or is this an important enough goal to require >> that void actions use braces? > > We initially tried that -- and people *hated* it. Right. But in the current pick-your-poison context of also wanting seamless coexistence with primitives, people might hate it less if it saves them from constantly needing to apply casts. This is just a thought; I'm not quite advocating. But I do notice more confusable/confused cases as Int* stuff is added to lambda repo. This is partly just my local problem mixing different javac vs jdk snapshots, but might also be a bad omen. -Doug From joe.darcy at oracle.com Sat Dec 29 09:12:06 2012 From: joe.darcy at oracle.com (Joe Darcy) Date: Sat, 29 Dec 2012 09:12:06 -0800 Subject: Request for review: proposal for @FunctionalInterface checking In-Reply-To: References: <50DDFAC7.4030206@oracle.com> <50DDFD40.7010303@oracle.com> <50DDFE72.9070401@oracle.com> Message-ID: <50DF2466.6000000@oracle.com> On 12/28/2012 12:36 PM, Kevin Bourrillion wrote: > (And yes, I understand the bits about action-at-a-distance and > documentation value. @Override has those too but /also/ so much more > value; this doesn't. That still puts it in the same category as most > of JSR 305.) > > Btw what are the costs of adding this annotation? > > * One more thing to learn, making lamdas seem some tiny percentage > more complicated > * Great potential for user misconception (that it's required), but > little harm should come of that > * Companies like mine have one more thing to add to our internal > style guides (can we use it, should we use it, must we use it?) > > The only costs I see are pretty small. However, the benefit also > looks extremely small to me, and I continue to think it just doesn't > quite seem to belong. FWIW, part of adding this annotation type to the JDK would be annotating the affected platform interfaces so that is a cost we value as appropriate for the code base. -Joe > > > On Fri, Dec 28, 2012 at 12:30 PM, Kevin Bourrillion > wrote: > > I see one important difference from @Override. @Override catches > errors that might otherwise go completely uncaught. With a type > intended to be a functional interface, the moment anyone ever > tries to use it as such, there's your compilation error. > > So I don't see what sets @FunctionalInterface apart from the whole > host of static-analysis annotations that we've relegated to the > now-abandoned JSR 305. > > > On Fri, Dec 28, 2012 at 12:17 PM, Brian Goetz > > wrote: > > Yes. If you mark an interface as functional, and it is not, > the compiler will warn/error. This prevents > action-at-a-distance errors where you have a SAM, other code > depends on its SAM-ness, and someone later decides to add > another abstract method (or a method to one of its > supertypes). It also provide extra documentation value. > > Basically, just like @Override. > > > On 12/28/2012 3:16 PM, Sam Pullara wrote: > > Is the intent that an interface that is not functional but > marked as such won't compile? > > Sam > > On Dec 28, 2012, at 3:12 PM, Brian Goetz > > > wrote: > > Note that this proposal does NOT intend to change the > rule that functional interfaces are recognized > structurally; single-method interfaces will still be > recognized as SAMs. This is more like @Override, > where the user can optionally capture design intent > and the compiler can warn when said design intent is > violated. > > I support this proposal. > > On 12/28/2012 3:02 PM, Joe Darcy wrote: > > Hello, > > We've had some discussions internally at Oracle > about adding a > FunctionalInterface annotation type to the > platform and we'd now like to > get the expert group's evaluation and feedback on > the proposal. > > Just as the java.lang.Override annotation type > allows compile-time > checking of programmer intent to override a > method, the goal for the > FunctionalInterface annotation type is to enable > analogous compile-time > checking of whether or not an interface type is > functional. Draft > specification: > > package java.lang; > > /** > Indicates that an interface type declaration is > intended to be a > functional interface as defined by the Java > Language > Specification. Conceptually, a functional > interface has exactly one > abstract method. Since default methods are not > abstract, any default > methods declared in an interface do not contribute > to its abstract > method count. If an interface declares a method > overriding one of the > public methods of java.lang.Object, that also does > not count > toward the abstract method count. > > Note that instances of functional interfaces can > be created with lambda > expressions, method references, or constructor > references. > > If a type is annotated with this annotation type, > compilers are required > to generate an error message unless: > >
    >
  • The type is an interface type and not an > annotation type, enum, or > class. >
  • The annotated type satisfies the requirements > of a functional > interface. >
> > @jls 9.8 Functional Interfaces > @jls 9.4.3 Interface Method Body > @jls 9.6.3.8 FunctionalInterface [Interfaces in > the java.lang package > get a corresponding JLS section] > @since 1.8 > */ > @Documented > @Retention(RUNTIME) > @Target(TYPE) > @interface FunctionalInterface {} // Marker annotation > > Annotations on interfaces are *not* inherited, > which is the proper > semantics in this case. A subinterface of a > functional interface can > add methods and thus not itself be functional. > There are some > subtleties to the definition of a functional > interface, but I thought > that including those by reference to the JLS was > sufficient and putting > in all the details would be more likely to confuse > than clarify. > > Please send comments by January 4, 2013; thanks, > > -Joe > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, > Inc. |kevinb at google.com > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121229/0d31de74/attachment-0001.html From forax at univ-mlv.fr Sat Dec 29 10:00:19 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 29 Dec 2012 19:00:19 +0100 Subject: Primitive streams In-Reply-To: <50DDDD0C.7030700@oracle.com> References: <50DDDD0C.7030700@oracle.com> Message-ID: <50DF2FB3.6050405@univ-mlv.fr> On 12/28/2012 06:55 PM, Brian Goetz wrote: > The implementation currently has two versions of streams, reference > and integer. Let's checkpoint on the primitive specialization > strategy, since it does result in a fair amount of code and API bloat > (though not as bad as it looks, since many of the currently public > abstractions will be made private.) > > So, let's start with the argument for specialized streams at all. > > 1. Boxing costs. Doing calculations like "sum of squares" in boxed > world is awful: > > int sumOfWeights = foos.map(Foo::weight).reduce(0, Integer::sum); > > Here, all the weights will be boxed and unboxed just to add them up. > Figure a 10x performance hit for that in the (many) cases where the VM > doesn't save us. > > It is possible to mitigate this somewhat by having fused mapReduce > methods, which we tried early on, such as : > > foos.mapReduce(Foo::getWeight, 0, Integer::sum) > > Here, at least now all the reduction is happening in the unboxed > domain. But the API is now nastier, and while the above is readable, > it gets worse in less trivial examples where there are more mapper and > reducer lambdas being passed as arguments and its not obvious which is > which. Plus the explosion of mapReduce forms: { Obj,int,long,double } > x { reduce forms }. Plus the combination of map, reduce, and fused > mapReduce leaves users wondering when they should do which. All to > work around boxing. > > This can be further mitigated by specialized fused operations for the > most common reductions: sumBy(IntMapper), maxBy(IntMapper), etc. > (Price: more overloads, more "when do I use what" confusion.) > > So, summary so far: we can mitigate boxing costs by cluttering the API > with lots of extra methods. (But I don't think that gets us all the > way.) But given that the inference algorithm and the lambda conversion algorithm don't consider Integer as a boxed int (unlike applicable method resolution by example), we need IntFunction, IntOperator etc. If we have these specialized function interfaces, having specialized stream is not as if we have a choice. The choice was done long before, when the lambda EG decide how inference/lambda conversion works. Now, I dislike fused operations because it goes against the DRY principle, the stream interface should be as simple as possible, so an operation should never be a compound of several ones and while the pipeline can hardly optimize to transform boxing to primitive operation, fuzing operations for performance inside the pipeline implementation is easy. So instead of stream.sumBy(IntMapper), we already have: stream.map(IntMapper).sum(). If the pipeline prefer to use fuzed operation, it's an implementation detail. > > > 2. Approachability. Telling Java developers that the way to add up a > bunch of numbers is to first recognize that integers form a monoid is > likely to make them feel like the guy in this cartoon: > > http://howfuckedismydatabase.com/nosql/ > > Reduce is wonderful and powerful and going to confuse the crap out of > 80+% of Java developers. (This was driven home to me dramatically > when I went on the road with my "Lambdas for Java" talk and saw blank > faces when I got to "reduce", even from relatively sophisticated > audiences. It took a lot of tweaking -- and explaining -- to get it to > the point where I didn't get a room full of blank stares.) > > Simply put: I believe the letters "s-u-m" have to appear prominently > in the API. When people are ready, they can learn to see reduce as a > generalization of sum(), but not until they're ready. Forcing them to > learn reduce() prematurely will hurt adoption. (The sumBy approach > above helps here too, again at a cost.) yes, we need sum. > > > 3. Numerics. Adding up doubles is not as simple as reducing with > Double::sum (unless you don't care about accuracy.) Having methods > for numeric sums gives us a place to put such intelligence; general > reduce does not. I'm always afraid when someone try to put "intelligence" in a program. We never have the same. > > > 4. "Primitives all the way down". While fused+specialized methods > will mitigate many of the above, it only helps at the very end of the > chain. It doesn't help things farther up, where we often just want to > generate streams of integers and operate on them as integers. Like: > > intRange(0, 100).map(...).filter(...).sorted().forEach(...) > > or > > integers().map(x -> x*x).limit(100).sum() > > > > We've currently got a (mostly) complete implementation of integer > streams. The actual operation implementations are surprisingly thin, > and many can share significant code across stream types (e.g., there's > one implementation of MatchOp, with relatively small adapters for > Of{Reference,Int,..}). Where most of the code bloat is is in the > internal supporting classes (such as the internal Node classes we use > to build conc trees) and the spillover into public interfaces > (PrimitiveIterator.Of{Int,Long,Double}). Correctly if i'm wrong but PrimitiveIterator are just here because of the escape hatch, it's a huge cost for something that will not be used very often. I'm not sure we should provide these public interface and see if you can not do better for Java 9. > > Historically we've shied away from giving users useful tools for > operating on primitives because we were afraid of the combinatorial > explosion: IntList, IntArrayList, DoubleSortedSynchronizedTreeList, > etc. While the explosion exists with streams too, we've managed to > limit it to something that is tolerable, and can finally give users > some useful tools for working with numeric calculations. > > > We've already limited the explosion to just doing int/long/double > instead of the full eight. We could pare further to just long/double, > since ints can fit easily into longs and most processors are 64-bit at > this point anyway. > processor are 64bits but using ints is still faster than long because there is less bus traffic. R?mi From brian.goetz at oracle.com Sat Dec 29 10:19:57 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 29 Dec 2012 13:19:57 -0500 Subject: Primitive streams In-Reply-To: <50DF2FB3.6050405@univ-mlv.fr> References: <50DDDD0C.7030700@oracle.com> <50DF2FB3.6050405@univ-mlv.fr> Message-ID: <50DF344D.90208@oracle.com> > So instead of stream.sumBy(IntMapper), we already have: > stream.map(IntMapper).sum(). If the pipeline prefer to use fuzed > operation, it's an implementation detail. If we have IntStream, then yes this is most natural. The sumBy suggestion (Don's, I believe) was an alternative in the event we do not have IntStream; you can't define sum() on Stream but you can define sumBy(IntMapper) on Stream. From brian.goetz at oracle.com Sat Dec 29 10:25:38 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 29 Dec 2012 13:25:38 -0500 Subject: Primitive streams In-Reply-To: <50DF2FB3.6050405@univ-mlv.fr> References: <50DDDD0C.7030700@oracle.com> <50DF2FB3.6050405@univ-mlv.fr> Message-ID: <50DF35A2.6050807@oracle.com> Summary: Remi says: - Yes, unfortunately we need primitive streams - Given primitive streams, fused ops are just extra complexity - Dropping IntStream (simulating with LongStream, as we do for short, byte, and char) is a questionable economy Other opinions? On 12/29/2012 1:00 PM, Remi Forax wrote: > On 12/28/2012 06:55 PM, Brian Goetz wrote: >> The implementation currently has two versions of streams, reference >> and integer. Let's checkpoint on the primitive specialization >> strategy, since it does result in a fair amount of code and API bloat >> (though not as bad as it looks, since many of the currently public >> abstractions will be made private.) >> >> So, let's start with the argument for specialized streams at all. >> >> 1. Boxing costs. Doing calculations like "sum of squares" in boxed >> world is awful: >> >> int sumOfWeights = foos.map(Foo::weight).reduce(0, Integer::sum); >> >> Here, all the weights will be boxed and unboxed just to add them up. >> Figure a 10x performance hit for that in the (many) cases where the VM >> doesn't save us. >> >> It is possible to mitigate this somewhat by having fused mapReduce >> methods, which we tried early on, such as : >> >> foos.mapReduce(Foo::getWeight, 0, Integer::sum) >> >> Here, at least now all the reduction is happening in the unboxed >> domain. But the API is now nastier, and while the above is readable, >> it gets worse in less trivial examples where there are more mapper and >> reducer lambdas being passed as arguments and its not obvious which is >> which. Plus the explosion of mapReduce forms: { Obj,int,long,double } >> x { reduce forms }. Plus the combination of map, reduce, and fused >> mapReduce leaves users wondering when they should do which. All to >> work around boxing. >> >> This can be further mitigated by specialized fused operations for the >> most common reductions: sumBy(IntMapper), maxBy(IntMapper), etc. >> (Price: more overloads, more "when do I use what" confusion.) >> >> So, summary so far: we can mitigate boxing costs by cluttering the API >> with lots of extra methods. (But I don't think that gets us all the >> way.) > > But given that the inference algorithm and the lambda conversion > algorithm don't consider Integer as a boxed int (unlike applicable > method resolution by example), > we need IntFunction, IntOperator etc. > If we have these specialized function interfaces, having specialized > stream is not as if we have a choice. > The choice was done long before, when the lambda EG decide how > inference/lambda conversion works. > > Now, I dislike fused operations because it goes against the DRY > principle, the stream interface should be as simple as possible, so an > operation should never be a compound of several ones and while the > pipeline can hardly optimize to transform boxing to primitive operation, > fuzing operations for performance inside the pipeline implementation is > easy. > So instead of stream.sumBy(IntMapper), we already have: > stream.map(IntMapper).sum(). If the pipeline prefer to use fuzed > operation, it's an implementation detail. > >> >> >> 2. Approachability. Telling Java developers that the way to add up a >> bunch of numbers is to first recognize that integers form a monoid is >> likely to make them feel like the guy in this cartoon: >> >> http://howfuckedismydatabase.com/nosql/ >> >> Reduce is wonderful and powerful and going to confuse the crap out of >> 80+% of Java developers. (This was driven home to me dramatically >> when I went on the road with my "Lambdas for Java" talk and saw blank >> faces when I got to "reduce", even from relatively sophisticated >> audiences. It took a lot of tweaking -- and explaining -- to get it to >> the point where I didn't get a room full of blank stares.) >> >> Simply put: I believe the letters "s-u-m" have to appear prominently >> in the API. When people are ready, they can learn to see reduce as a >> generalization of sum(), but not until they're ready. Forcing them to >> learn reduce() prematurely will hurt adoption. (The sumBy approach >> above helps here too, again at a cost.) > > yes, we need sum. > >> >> >> 3. Numerics. Adding up doubles is not as simple as reducing with >> Double::sum (unless you don't care about accuracy.) Having methods >> for numeric sums gives us a place to put such intelligence; general >> reduce does not. > > I'm always afraid when someone try to put "intelligence" in a program. > We never have the same. > >> >> >> 4. "Primitives all the way down". While fused+specialized methods >> will mitigate many of the above, it only helps at the very end of the >> chain. It doesn't help things farther up, where we often just want to >> generate streams of integers and operate on them as integers. Like: >> >> intRange(0, 100).map(...).filter(...).sorted().forEach(...) >> >> or >> >> integers().map(x -> x*x).limit(100).sum() >> >> >> >> We've currently got a (mostly) complete implementation of integer >> streams. The actual operation implementations are surprisingly thin, >> and many can share significant code across stream types (e.g., there's >> one implementation of MatchOp, with relatively small adapters for >> Of{Reference,Int,..}). Where most of the code bloat is is in the >> internal supporting classes (such as the internal Node classes we use >> to build conc trees) and the spillover into public interfaces >> (PrimitiveIterator.Of{Int,Long,Double}). > > Correctly if i'm wrong but PrimitiveIterator are just here because of > the escape hatch, it's a huge cost for something that will not be used > very often. > I'm not sure we should provide these public interface and see if you can > not do better for Java 9. > >> >> Historically we've shied away from giving users useful tools for >> operating on primitives because we were afraid of the combinatorial >> explosion: IntList, IntArrayList, DoubleSortedSynchronizedTreeList, >> etc. While the explosion exists with streams too, we've managed to >> limit it to something that is tolerable, and can finally give users >> some useful tools for working with numeric calculations. >> >> >> We've already limited the explosion to just doing int/long/double >> instead of the full eight. We could pare further to just long/double, >> since ints can fit easily into longs and most processors are 64-bit at >> this point anyway. >> > > processor are 64bits but using ints is still faster than long because > there is less bus traffic. > > R?mi > From joe.bowbeer at gmail.com Sat Dec 29 10:36:26 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 29 Dec 2012 10:36:26 -0800 Subject: Primitive streams In-Reply-To: <50DF35A2.6050807@oracle.com> References: <50DDDD0C.7030700@oracle.com> <50DF2FB3.6050405@univ-mlv.fr> <50DF35A2.6050807@oracle.com> Message-ID: > (simulating with LongStream, as we do for short, byte, and char) Because the other primitive types listed are all currently simulated with *IntStream* then dropping IntStream would be especially painful in terms of memory footprint. More painful is that there are many methods (not to mention array indexing) that is tied to the int primitive type, and longs won't fit there without some sort of explicit down-casting, which will probably be of concern to static analysis tools (FindBugs). Joe On Sat, Dec 29, 2012 at 10:25 AM, Brian Goetz wrote: > Summary: Remi says: > > - Yes, unfortunately we need primitive streams > - Given primitive streams, fused ops are just extra complexity > - Dropping IntStream (simulating with LongStream, as we do for short, > byte, and char) is a questionable economy > > Other opinions? > > > On 12/29/2012 1:00 PM, Remi Forax wrote: > >> On 12/28/2012 06:55 PM, Brian Goetz wrote: >> >>> The implementation currently has two versions of streams, reference >>> and integer. Let's checkpoint on the primitive specialization >>> strategy, since it does result in a fair amount of code and API bloat >>> (though not as bad as it looks, since many of the currently public >>> abstractions will be made private.) >>> >>> So, let's start with the argument for specialized streams at all. >>> >>> 1. Boxing costs. Doing calculations like "sum of squares" in boxed >>> world is awful: >>> >>> int sumOfWeights = foos.map(Foo::weight).reduce(**0, Integer::sum); >>> >>> Here, all the weights will be boxed and unboxed just to add them up. >>> Figure a 10x performance hit for that in the (many) cases where the VM >>> doesn't save us. >>> >>> It is possible to mitigate this somewhat by having fused mapReduce >>> methods, which we tried early on, such as : >>> >>> foos.mapReduce(Foo::getWeight, 0, Integer::sum) >>> >>> Here, at least now all the reduction is happening in the unboxed >>> domain. But the API is now nastier, and while the above is readable, >>> it gets worse in less trivial examples where there are more mapper and >>> reducer lambdas being passed as arguments and its not obvious which is >>> which. Plus the explosion of mapReduce forms: { Obj,int,long,double } >>> x { reduce forms }. Plus the combination of map, reduce, and fused >>> mapReduce leaves users wondering when they should do which. All to >>> work around boxing. >>> >>> This can be further mitigated by specialized fused operations for the >>> most common reductions: sumBy(IntMapper), maxBy(IntMapper), etc. >>> (Price: more overloads, more "when do I use what" confusion.) >>> >>> So, summary so far: we can mitigate boxing costs by cluttering the API >>> with lots of extra methods. (But I don't think that gets us all the >>> way.) >>> >> >> But given that the inference algorithm and the lambda conversion >> algorithm don't consider Integer as a boxed int (unlike applicable >> method resolution by example), >> we need IntFunction, IntOperator etc. >> If we have these specialized function interfaces, having specialized >> stream is not as if we have a choice. >> The choice was done long before, when the lambda EG decide how >> inference/lambda conversion works. >> >> Now, I dislike fused operations because it goes against the DRY >> principle, the stream interface should be as simple as possible, so an >> operation should never be a compound of several ones and while the >> pipeline can hardly optimize to transform boxing to primitive operation, >> fuzing operations for performance inside the pipeline implementation is >> easy. >> So instead of stream.sumBy(IntMapper), we already have: >> stream.map(IntMapper).sum(). If the pipeline prefer to use fuzed >> operation, it's an implementation detail. >> >> >>> >>> 2. Approachability. Telling Java developers that the way to add up a >>> bunch of numbers is to first recognize that integers form a monoid is >>> likely to make them feel like the guy in this cartoon: >>> >>> http://howfuckedismydatabase.**com/nosql/ >>> >>> Reduce is wonderful and powerful and going to confuse the crap out of >>> 80+% of Java developers. (This was driven home to me dramatically >>> when I went on the road with my "Lambdas for Java" talk and saw blank >>> faces when I got to "reduce", even from relatively sophisticated >>> audiences. It took a lot of tweaking -- and explaining -- to get it to >>> the point where I didn't get a room full of blank stares.) >>> >>> Simply put: I believe the letters "s-u-m" have to appear prominently >>> in the API. When people are ready, they can learn to see reduce as a >>> generalization of sum(), but not until they're ready. Forcing them to >>> learn reduce() prematurely will hurt adoption. (The sumBy approach >>> above helps here too, again at a cost.) >>> >> >> yes, we need sum. >> >> >>> >>> 3. Numerics. Adding up doubles is not as simple as reducing with >>> Double::sum (unless you don't care about accuracy.) Having methods >>> for numeric sums gives us a place to put such intelligence; general >>> reduce does not. >>> >> >> I'm always afraid when someone try to put "intelligence" in a program. >> We never have the same. >> >> >>> >>> 4. "Primitives all the way down". While fused+specialized methods >>> will mitigate many of the above, it only helps at the very end of the >>> chain. It doesn't help things farther up, where we often just want to >>> generate streams of integers and operate on them as integers. Like: >>> >>> intRange(0, 100).map(...).filter(...).**sorted().forEach(...) >>> >>> or >>> >>> integers().map(x -> x*x).limit(100).sum() >>> >>> >>> >>> We've currently got a (mostly) complete implementation of integer >>> streams. The actual operation implementations are surprisingly thin, >>> and many can share significant code across stream types (e.g., there's >>> one implementation of MatchOp, with relatively small adapters for >>> Of{Reference,Int,..}). Where most of the code bloat is is in the >>> internal supporting classes (such as the internal Node classes we use >>> to build conc trees) and the spillover into public interfaces >>> (PrimitiveIterator.Of{Int,**Long,Double}). >>> >> >> Correctly if i'm wrong but PrimitiveIterator are just here because of >> the escape hatch, it's a huge cost for something that will not be used >> very often. >> I'm not sure we should provide these public interface and see if you can >> not do better for Java 9. >> >> >>> Historically we've shied away from giving users useful tools for >>> operating on primitives because we were afraid of the combinatorial >>> explosion: IntList, IntArrayList, DoubleSortedSynchronizedTreeLi**st, >>> etc. While the explosion exists with streams too, we've managed to >>> limit it to something that is tolerable, and can finally give users >>> some useful tools for working with numeric calculations. >>> >>> >>> We've already limited the explosion to just doing int/long/double >>> instead of the full eight. We could pare further to just long/double, >>> since ints can fit easily into longs and most processors are 64-bit at >>> this point anyway. >>> >>> >> processor are 64bits but using ints is still faster than long because >> there is less bus traffic. >> >> R?mi >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121229/ab59df02/attachment-0001.html From dl at cs.oswego.edu Sat Dec 29 10:38:06 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 29 Dec 2012 13:38:06 -0500 Subject: Primitive streams In-Reply-To: <50DF2FB3.6050405@univ-mlv.fr> References: <50DDDD0C.7030700@oracle.com> <50DF2FB3.6050405@univ-mlv.fr> Message-ID: <50DF388E.1050409@cs.oswego.edu> On 12/29/12 13:00, Remi Forax wrote: > Now, I dislike fused operations because it goes against the DRY principle, the > stream interface should be as simple as possible, Two different reactions: 1. Why do Streams combine function composition and aggregate computation when it would be simpler not to? (answer: prettier look-and-feel?) 2. Why does every other map-reduce framework support mapReduce? (answer: to reflect the fact that map-reduce is a fused concept for most people?) -Doug From forax at univ-mlv.fr Sat Dec 29 11:14:46 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 29 Dec 2012 20:14:46 +0100 Subject: Primitive streams In-Reply-To: <50DF388E.1050409@cs.oswego.edu> References: <50DDDD0C.7030700@oracle.com> <50DF2FB3.6050405@univ-mlv.fr> <50DF388E.1050409@cs.oswego.edu> Message-ID: <50DF4126.1050608@univ-mlv.fr> On 12/29/2012 07:38 PM, Doug Lea wrote: > On 12/29/12 13:00, Remi Forax wrote: > >> Now, I dislike fused operations because it goes against the DRY >> principle, the >> stream interface should be as simple as possible, > > Two different reactions: > > 1. Why do Streams combine function composition and aggregate computation > when it would be simpler not to? > (answer: prettier look-and-feel?) > > 2. Why does every other map-reduce framework support mapReduce? > (answer: to reflect the fact that map-reduce is a fused concept for > most people?) I don't think Clojure as a mapReduce, neither Python nor Ruby. Of course, MapReduce exists as a programming model for distributed computing. > > -Doug > R?mi From forax at univ-mlv.fr Sun Dec 30 05:12:39 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 30 Dec 2012 14:12:39 +0100 Subject: Tabulators -- a catalog In-Reply-To: <6712820CB52CFB4D842561213A77C05404C0C17E62@GSCMAMP09EX.firmwide.corp.gs.com> References: <50DD02AE.3040803@oracle.com> <50DDAF20.9030402@univ-mlv.fr> <50DDBA91.1090106@oracle.com> <50DDBEC8.4080602@univ-mlv.fr> <50DDBFC1.4030103@oracle.com> <6712820CB52CFB4D842561213A77C05404C0C17E62@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <50E03DC7.1010601@univ-mlv.fr> On 12/28/2012 05:29 PM, Raab, Donald wrote: > This is the route we went. > > interface PartitionCollection > { > Collection getPositive(); > Collection getNegative(); > } > > More specific than Pair. Less mutative, flexible and annoying than Collection[]. yes, but because Tabulators also do a reduce when partitioning, we also need something which is conceptually a pair of value (pair of reduced value) and not only a pair of collections. Tabulators has a lot of method that are just a composition of several simple methods (map/reduce) that already exist, so I think we should remove them and only kept the ones that allow to chain groupBy and maybe reduceBy. R?mi > >> -----Original Message----- >> From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda-libs- >> spec-experts-bounces at openjdk.java.net] On Behalf Of Brian Goetz >> Sent: Friday, December 28, 2012 10:50 AM >> To: Remi Forax >> Cc: lambda-libs-spec-experts at openjdk.java.net >> Subject: Re: Tabulators -- a catalog >> >> Seems like overkill :( >> >> On 12/28/2012 10:46 AM, Remi Forax wrote: >>> On 12/28/2012 04:28 PM, Brian Goetz wrote: >>>> So the thing to do here is return Object[] instead of T[] / D[]. Sad, >>>> but not terrible. Not important enough to have the user pass in a >>>> factory. For want of a Pair... >>> The other solution is to send a j.u.List with a specific non mutable >>> implementation able to store only two elements. >>> >>> R?mi >>> >>>> On 12/28/2012 9:39 AM, Remi Forax wrote: >>>>> On 12/28/2012 03:23 AM, Brian Goetz wrote: >>>>>> Here's a catalog of the currently implemented Tabulators. >>>>> [...] >>>>> >>>>>> 3. Partition. Partitions a stream according to a predicate. >>>>>> Results always are a two-element array of something. Five forms: >>>>>> >>>>>> // Basic >>>>>> Tabulator[]> >>>>>> partition(Predicate predicate) >>>>>> >>>>>> // Explicit factory >>>>>> > Tabulator >>>>>> partition(Predicate predicate, >>>>>> Supplier rowFactory) >>>>>> >>>>>> // Partitioned mutable reduce >>>>>> Tabulator >>>>>> partition(Predicate predicate, >>>>>> MutableReducer downstream) >>>>>> >>>>>> // Partitioned functional reduce >>>>>> Tabulator >>>>>> partition(Predicate predicate, >>>>>> T zero, >>>>>> BinaryOperator reducer) >>>>>> >>>>>> // Partitioned functional map-reduce >>>>>> Tabulator >>>>>> partition(Predicate predicate, >>>>>> T zero, >>>>>> Function mapper, >>>>>> BinaryOperator reducer) >>>>> You can't create an array of T (C, D) safely, so casting an array of >>>>> Object to an array of T is maybe acceptable if you control all the >>>>> access to that array like in collections, but here you export it. >>>>> >>>>> R?mi >>>>> From brian.goetz at oracle.com Sun Dec 30 19:53:41 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 30 Dec 2012 22:53:41 -0500 Subject: Cancelation -- use cases In-Reply-To: <50DDC6B3.8050805@oracle.com> References: <50DDC6B3.8050805@oracle.com> Message-ID: <50E10C45.5000508@oracle.com> Here's a lower-complexity version of cancel, that still satisfies (in series or in parallel) use cases like the following: > - Find the best possible move after thinking for 5 seconds > - Find the first solution that is better than X > - Gather solutions until we have 100 of them without bringing in the complexity or time/space overhead of dealing with encounter order. Since the forEach() operation works exclusively on the basis of temporal/arrival order rather than spatial/encounter order (elements are passed to the lambda in whatever order they are available, in whatever thread they are available), we could make a canceling variant of forEach: .forEachUntil(Block sink, BooleanSupplier until) Here, there is no confusion about what happens in the ordered case, no need to buffer elements, etc. Elements flow into the block until the termination condition transpires, at which point there are no more splits and existing splits dispense no more elements. I implemented this (it was trivial) and wrote a simple test program to calculate primes sequentially and in parallel, counting how many could be calculated in a fixed amount of time, starting from an infinite generator and filtering out composites: Streams.iterate(from, i -> i + 1) // sequential .filter(i -> isPrime(i)) .forEachUntil(i -> { chm.put(i, true); }, () -> System.currentTimeMillis() >= start+num); vs Streams.iterate(from, i -> i+1) // parallel .parallel() .filter(i -> isPrime(i)) .forEachUntil(i -> { chm.put(i, true); }, () -> System.currentTimeMillis() >= start+num); On a 4-core Q6600 system, in a fixed amount of time, the parallel version gathered ~3x as many primes. In terms of being able to perform useful computations on infinite streams, this seems a pretty attractive price-performer; lower spec and implementation complexity, and covers many of the use cases which would otherwise be impractical to attack with the stream approach. On 12/28/2012 11:20 AM, Brian Goetz wrote: > I've been working through some alternatives for cancellation support in > infinite streams. Looking to gather some use case background to help > evaluate the alternatives. > > In the serial case, the "gate" approach works fine -- after some > criteria transpires, stop sending elements downstream. The pipeline > flushes the elements it has, and completes early. > > In the parallel unordered case, the gate approach similarly works fine > -- after the cancelation criteria occurs, no new splits are created, and > existing splits dispense no more elements. The computation similarly > quiesces after elements currently being processed are completed, > possibly along with any up-tree merging to combine results. > > It is the parallel ordered case that is tricky. Supposing we partition > a stream into > (a1,a2,a3), (a4,a5,a6) > > And suppose further we happen to be processing a5 when the bell goes > off. Do we want to wait for all a_i, i<5, to finish before letting the > computation quiesce? > > My gut says: for the things we intend to cancel, most of them will be > order-insensitive anyway. Things like: > > - Find the best possible move after thinking for 5 seconds > - Find the first solution that is better than X > - Gather solutions until we have 100 of them > > I believe the key use case for cancelation here will be when we are > chewing on potentially infinite streams of events (probably backed by > IO) where we want to chew until we're asked to shut down, and want to > get as much parallelism as we can cheaply. Which suggests to me the > intersection between order-sensitive stream pipelines and cancelable > stream pipelines is going to be pretty small indeed. > > Anyone want to add to this model of use cases for cancelation? > From david.holmes at oracle.com Sun Dec 30 23:24:59 2012 From: david.holmes at oracle.com (David Holmes) Date: Mon, 31 Dec 2012 17:24:59 +1000 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: <50D74310.3090709@oracle.com> References: <50D74310.3090709@oracle.com> Message-ID: <50E13DCB.40007@oracle.com> Does this really buy enough to make distinct interfaces worth their weight? David On 24/12/2012 3:44 AM, Brian Goetz wrote: > For types that have primitive specializations that are subtypes of the > base type (e.g., MutableReducer), we've been converging on a naming > convention that puts the subtypes as nested interfaces. For example: > > interface MutableReducer { > > // reducer methods > > interface OfInt extends MutableReducer { ... } > interface OfLong extends MutableReducer { ... } > } > > The motivation here is (a) reduce the javadoc surface area, (b) groups > related abstractions together, (c) makes it clear that these are > subsidiary abstractions, and (d) keep the cut-and-paste stuff together > in the code. The use site also looks pretty reasonable: > > class Foo implements MutableReducer.OfInt { ... } > > This shows up in Sink, IntermediateOp, TerminalOp, MutableReducer, > NodeBuilder, Node, Spliterator, etc. (It also shows up in concrete > implementation classes like ForEachOp, MatchOp, and FindOp, though these > will not be public and so (a) doesn't really apply to these.) > > Are we OK with this convention? It seems to have a tidying effect on the > codebase, the documentation, and client usage, and mitigates the pain of > the primitive specializations. (There will be grey areas where its > applicability is questionable; we can discuss those individually, but > there are a lot of things that it works for.) From brian.goetz at oracle.com Mon Dec 31 08:18:41 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 11:18:41 -0500 Subject: Streams -- philosophy Message-ID: <50E1BAE1.3000202@oracle.com> I'd like to take a quick step back to go over some of the philosophical goals for the streams library. 1. Put the programming model front and center. Bulk computations with streams will not be, in most cases, the absolutely most performant way to do anything. And that's OK. The goal here is to give a clean way to compose complex computations from simple building blocks that work with any data source. We're willing to give up a little potential performance for the sake of making a programming model which is clean, expressive, and orthogonal. (And, our performance data so far suggests that we're not giving up so much as to be worrisome.) For a significant fraction of the Java code in the world, performance is nowhere near the #1 consideration -- a lot of code is plenty fast already. In these cases, being able to express things cleanly, clearly, and in a less error-prone manner yields far more value than making it faster. 2. Use data where it lives. Users should be able to use existing data sources to feed stream computations, without having to reason excessively about their characteristics, or do much work to transform them into stream sources. Existing Collections should just work. Again, this generality has a cost, which we're willing to pay. Even non-thread-safe collections like ArrayList should permit parallel traversal, so long as the user's computation meets the non-interference guidelines (i.e., don't mutate the source during the traversal, don't provide lambdas that are dependent on state that is modified during the traversal.) We're in the process of formalizing these non-interference requirements. 3. Easy onramp to parallelism. Until now, the serial and parallel expressions of a computation looked dramatically different, and it was a lot of work to go from serial to parallel. And, it was tricky, meaning users would make mistakes or avoid trying. This work should make it easy for developers to add parallelism to stream computations without major changes to their code. I hold out no hope that our general-purpose approach will ever beat the best hand-tuned code, and that's fine. The goal is to give users an attractive cost-benefit equation for parallism; do a trivial amount of work (not quite limited to typing ".parallel()", but close), and get a reasonable amount of parallelism for almost no cost -- while minimally perturbing the source code. 4. Make a clean break from Old Collections / a bridge to New Collections. The Collections framework was about providing a basic set of building blocks for data structures. Streams is about providing a set of building blocks for computation, that is complete divorced from the underlying data structure. Collections were huge in 1997, but they're starting to show their age. We will eventually have to do New Collections, for one of any number of reasons: 32-bit size limitation, lack of reification, pervasive mutability, take your pick. We would like for Streams to easily fit into those new collections, without tying them to Old Collections. So, it is a goal to keep Collection/List/Set/Map out of the core API. It's bad enough that we are exposing Iterator (Doug and I have been looking for alternatives, but so far, for all, the cure is worse than the disease.) The proximate impetus for the Tabulators work was to get Map out of the API (as it turned out, the result was far more powerful and expressive than what we started with, which is a nice bonus). 5. Balance between serial and parallel use cases. I get about an equal amount of mail suggesting that supporting the {serial,parallel} scenario is distorting the API for the "far more important" {parallel,serial} scenario. Neither camp of extremists will be satisfied here. We experimented early on with separate abstractions for serial and parallel streams; the resulting API was byzantine. (Summary: "OMG too many interfaces") Having one abstraction for both serial and parallel is overall a pretty big simplicity win, though it definitely does put pressure on the peculiarities of each. (Doug wants us to get rid of the stateful intermediate operations (sorted, removeDuplicates, limit) because they compose badly in parallel. Sam rolls his eyes every time I say "but that only works sequentially".) Where we are in history is that the sequential scenarios are still important (and will continue to be for some time), but over time, the parallel scenarios will become more important. Designing something that is sequential-centric today would be backward-looking; designing something that is parallel-centric today is not useful to a broad slice of the user base. Expect continued tension at the edges as we try to balance the needs of both usages (and slowly educate the world about how parallel differs from serial.) 6. Have a path to an open system. Anyone can create a parallel Stream by creating a Spliterator for their data structure (or, by providing an Iterator, and letting us split that, albeit less efficiently.) That's a good start. The Stream API is designed around an extensible set of intermediate and terminal operations, which uses an "SPI" to define IntermediateOp, StatefulOp, and TerminalOp. While we do not plan to expose this SPI in 8 (purely a triage decision; its not ready to stamp into concrete), we want to expose it as soon as practical. The internal pipeline(XxxOp) methods would be exposed, and users who create new ops could integrate them into existing pipelines like: list.stream() .filter(...) .pipeline(dropEverySecondElement()) .map(...) ... The "pipeline" method is the escape hatch to add new stages into the pipeline (this is actually how all ops are implemented internally now.) Users would then be free to create new intermediate or terminal operations and thread them into pipelines with pipeline(op). 7. Parallelism is explicit. We don't want to inflict parallelism on anyone who doesn't ask for it; retrofitting parallelism transparently in Java is likely to be as successful as retrofitting remoteness transparently into method invocations. Our guideline is that the word "parallel" must appear somewhere, as in: list.parallelStream() Arrays.parallelSort(array) From brian.goetz at oracle.com Mon Dec 31 08:20:59 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 11:20:59 -0500 Subject: Foo.Of{Int,Long,Double} naming convention In-Reply-To: <50E13DCB.40007@oracle.com> References: <50D74310.3090709@oracle.com> <50E13DCB.40007@oracle.com> Message-ID: <50E1BB6B.4090006@oracle.com> There are two things here: - Do we need separate interfaces - Does this naming convention make sense The naming convention was proposed as a means of reducing the perceived surface area in cases where separate interfaces are needed. There are definitely cases where separate interfaces are needed. For example, Iterator. If we have a stream of ints, we need some way to ask for the ints without boxing. This could be PrimitiveIterator.OfInt, IntIterator (Interator?), IntCursor, etc. So this thread is mostly about "If we do find ourselves needing separate interfaces, what should we call them." Separately, we have been trying to not specialized when we can get away with doing so. On 12/31/2012 2:24 AM, David Holmes wrote: > Does this really buy enough to make distinct interfaces worth their weight? > > David > > On 24/12/2012 3:44 AM, Brian Goetz wrote: >> For types that have primitive specializations that are subtypes of the >> base type (e.g., MutableReducer), we've been converging on a naming >> convention that puts the subtypes as nested interfaces. For example: >> >> interface MutableReducer { >> >> // reducer methods >> >> interface OfInt extends MutableReducer { ... } >> interface OfLong extends MutableReducer { ... } >> } >> >> The motivation here is (a) reduce the javadoc surface area, (b) groups >> related abstractions together, (c) makes it clear that these are >> subsidiary abstractions, and (d) keep the cut-and-paste stuff together >> in the code. The use site also looks pretty reasonable: >> >> class Foo implements MutableReducer.OfInt { ... } >> >> This shows up in Sink, IntermediateOp, TerminalOp, MutableReducer, >> NodeBuilder, Node, Spliterator, etc. (It also shows up in concrete >> implementation classes like ForEachOp, MatchOp, and FindOp, though these >> will not be public and so (a) doesn't really apply to these.) >> >> Are we OK with this convention? It seems to have a tidying effect on the >> codebase, the documentation, and client usage, and mitigates the pain of >> the primitive specializations. (There will be grey areas where its >> applicability is questionable; we can discuss those individually, but >> there are a lot of things that it works for.) From sam at sampullara.com Mon Dec 31 08:34:52 2012 From: sam at sampullara.com (Sam Pullara) Date: Mon, 31 Dec 2012 08:34:52 -0800 Subject: Cancelation -- use cases In-Reply-To: <50E10C45.5000508@oracle.com> References: <50DDC6B3.8050805@oracle.com> <50E10C45.5000508@oracle.com> Message-ID: I think we are conflating two things with this solution and it doesn't work for them in my mind. Here is what I would like the solution to cover: - External conditions (cancellation, cleanup) - Internal conditions (gating based on count, elements and results) The first one may be the only one that works in the parallel case. It should likely be implemented with .close() on stream that would stop the stream as soon as possible. This would be useful for things like timeouts. Kind of like calling close on an inputstream in the middle of reading it. The other one I think is necessary and hard to implement correctly with the parallel case. For instance I would like to say: stream.gate(e -> e < 10).forEach(e -> ?) OR stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the number of the current element That should give me every element in the stream until an element isn't < 10 and then stop processing elements. Further, there should be some way for the stream source to be notified that we are done consuming it in case it is of unknown length or consumes resources. That would be more like (assuming we add a Runnable call to Timer): Stream stream = ?. new Timer().schedule(() -> stream.close(), 5000); stream.forEach(e -> ?.); OR stream.forEach(e -> try { ? } catch() { stream.close() } ); Sadly, the first gate() case doesn't work well when parallelized. I'm willing to just specify what the behavior is for that case to get it into the API. For example, I would probably say something like "the gate will need to return false once per split to stop processing". In either of these cases I think one of the motivations needs to be that the stream may be using resources and we need to tell the source that we are done consuming it. For example, if the stream is sourced from a file, database or even a large amount of memory there should be a notification mechanism for doneness that will allow those resources to be returned before the stream is exhausted. To that end I think that Stream should implement AutoCloseable but overridden with no checked exception. interface Stream implements AutoCloseable { /** * Closes this stream and releases any system resources associated * with it. If the stream is already closed then invoking this * method has no effect. Close is automatically called when the * stream is exhausted. After this is called, no further elements * will be processed by the stream but currently processing elements * will complete normally. Calling other methods on a closed stream will * produce IllegalStateExceptions. */ void close(); /** * When the continueProcessing function returns false, no further * elements will be processed after the gate. In the parallel stream * case no further elements will be processed in the current split. */ Stream gate(Function until); /** * As gate with the addition of the current element number. */ Stream gate(BiFunction until); } This API avoids a lot of side effects that Brian's proposal needs to implement these use cases. BTW, just noticed you can't get the lines of a file as a Stream. I think this would enable us to do that in a way that works well ? the Files.readAllLines() method is an abomination and probably a time bomb for anyone that uses it. Sam On Dec 30, 2012, at 7:53 PM, Brian Goetz wrote: > Here's a lower-complexity version of cancel, that still satisfies (in series or in parallel) use cases like the following: > > > - Find the best possible move after thinking for 5 seconds > > - Find the first solution that is better than X > > - Gather solutions until we have 100 of them > > without bringing in the complexity or time/space overhead of dealing with encounter order. > > Since the forEach() operation works exclusively on the basis of temporal/arrival order rather than spatial/encounter order (elements are passed to the lambda in whatever order they are available, in whatever thread they are available), we could make a canceling variant of forEach: > > .forEachUntil(Block sink, BooleanSupplier until) > > Here, there is no confusion about what happens in the ordered case, no need to buffer elements, etc. Elements flow into the block until the termination condition transpires, at which point there are no more splits and existing splits dispense no more elements. > > I implemented this (it was trivial) and wrote a simple test program to calculate primes sequentially and in parallel, counting how many could be calculated in a fixed amount of time, starting from an infinite generator and filtering out composites: > > Streams.iterate(from, i -> i + 1) // sequential > .filter(i -> isPrime(i)) > .forEachUntil(i -> { > chm.put(i, true); > }, () -> System.currentTimeMillis() >= start+num); > > vs > > Streams.iterate(from, i -> i+1) // parallel > .parallel() > .filter(i -> isPrime(i)) > .forEachUntil(i -> { > chm.put(i, true); > }, () -> System.currentTimeMillis() >= start+num); > > On a 4-core Q6600 system, in a fixed amount of time, the parallel version gathered ~3x as many primes. > > In terms of being able to perform useful computations on infinite streams, this seems a pretty attractive price-performer; lower spec and implementation complexity, and covers many of the use cases which would otherwise be impractical to attack with the stream approach. > > > > On 12/28/2012 11:20 AM, Brian Goetz wrote: >> I've been working through some alternatives for cancellation support in >> infinite streams. Looking to gather some use case background to help >> evaluate the alternatives. >> >> In the serial case, the "gate" approach works fine -- after some >> criteria transpires, stop sending elements downstream. The pipeline >> flushes the elements it has, and completes early. >> >> In the parallel unordered case, the gate approach similarly works fine >> -- after the cancelation criteria occurs, no new splits are created, and >> existing splits dispense no more elements. The computation similarly >> quiesces after elements currently being processed are completed, >> possibly along with any up-tree merging to combine results. >> >> It is the parallel ordered case that is tricky. Supposing we partition >> a stream into >> (a1,a2,a3), (a4,a5,a6) >> >> And suppose further we happen to be processing a5 when the bell goes >> off. Do we want to wait for all a_i, i<5, to finish before letting the >> computation quiesce? >> >> My gut says: for the things we intend to cancel, most of them will be >> order-insensitive anyway. Things like: >> >> - Find the best possible move after thinking for 5 seconds >> - Find the first solution that is better than X >> - Gather solutions until we have 100 of them >> >> I believe the key use case for cancelation here will be when we are >> chewing on potentially infinite streams of events (probably backed by >> IO) where we want to chew until we're asked to shut down, and want to >> get as much parallelism as we can cheaply. Which suggests to me the >> intersection between order-sensitive stream pipelines and cancelable >> stream pipelines is going to be pretty small indeed. >> >> Anyone want to add to this model of use cases for cancelation? >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121231/33cd8edd/attachment-0001.html From brian.goetz at oracle.com Mon Dec 31 08:46:46 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 11:46:46 -0500 Subject: Cancelation -- use cases In-Reply-To: References: <50DDC6B3.8050805@oracle.com> <50E10C45.5000508@oracle.com> Message-ID: <50E1C176.3030003@oracle.com> > BTW, just noticed you can't get the lines of > a file as a Stream. We have this: interface Reader { Stream lines() } From forax at univ-mlv.fr Mon Dec 31 08:50:21 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 31 Dec 2012 17:50:21 +0100 Subject: Cancelation -- use cases In-Reply-To: References: <50DDC6B3.8050805@oracle.com> <50E10C45.5000508@oracle.com> Message-ID: <50E1C24D.3090404@univ-mlv.fr> On 12/31/2012 05:34 PM, Sam Pullara wrote: > BTW, just noticed you can't get the lines of a file as a Stream. I > think this would enable us to do that in a way that works well ? the > Files.readAllLines() method is an abomination and probably a time bomb > for anyone that uses it. > > Sam If you want the history: http://mail.openjdk.java.net/pipermail/nio-dev/2008-November/000285.html and yes, we should introduce a Files.readLines() that returns a Stream. R?mi From forax at univ-mlv.fr Mon Dec 31 08:56:38 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 31 Dec 2012 17:56:38 +0100 Subject: Cancelation -- use cases In-Reply-To: <50E10C45.5000508@oracle.com> References: <50DDC6B3.8050805@oracle.com> <50E10C45.5000508@oracle.com> Message-ID: <50E1C3C6.8030000@univ-mlv.fr> I've trouble to understood the difference with: Streams.iterate(from, i -> i + 1) // sequential .filter(i -> isPrime(i)) .until(() -> System.currentTimeMillis() >= start+num)). .forEachUntil(i -> { chm.put(i, true); }; R?mi On 12/31/2012 04:53 AM, Brian Goetz wrote: > Here's a lower-complexity version of cancel, that still satisfies (in > series or in parallel) use cases like the following: > > > - Find the best possible move after thinking for 5 seconds > > - Find the first solution that is better than X > > - Gather solutions until we have 100 of them > > without bringing in the complexity or time/space overhead of dealing > with encounter order. > > Since the forEach() operation works exclusively on the basis of > temporal/arrival order rather than spatial/encounter order (elements > are passed to the lambda in whatever order they are available, in > whatever thread they are available), we could make a canceling variant > of forEach: > > .forEachUntil(Block sink, BooleanSupplier until) > > Here, there is no confusion about what happens in the ordered case, no > need to buffer elements, etc. Elements flow into the block until the > termination condition transpires, at which point there are no more > splits and existing splits dispense no more elements. > > I implemented this (it was trivial) and wrote a simple test program to > calculate primes sequentially and in parallel, counting how many could > be calculated in a fixed amount of time, starting from an infinite > generator and filtering out composites: > > Streams.iterate(from, i -> i + 1) // sequential > .filter(i -> isPrime(i)) > .forEachUntil(i -> { > chm.put(i, true); > }, () -> System.currentTimeMillis() >= start+num); > > vs > > Streams.iterate(from, i -> i+1) // parallel > .parallel() > .filter(i -> isPrime(i)) > .forEachUntil(i -> { > chm.put(i, true); > }, () -> System.currentTimeMillis() >= start+num); > > On a 4-core Q6600 system, in a fixed amount of time, the parallel > version gathered ~3x as many primes. > > In terms of being able to perform useful computations on infinite > streams, this seems a pretty attractive price-performer; lower spec > and implementation complexity, and covers many of the use cases which > would otherwise be impractical to attack with the stream approach. > > > > On 12/28/2012 11:20 AM, Brian Goetz wrote: >> I've been working through some alternatives for cancellation support in >> infinite streams. Looking to gather some use case background to help >> evaluate the alternatives. >> >> In the serial case, the "gate" approach works fine -- after some >> criteria transpires, stop sending elements downstream. The pipeline >> flushes the elements it has, and completes early. >> >> In the parallel unordered case, the gate approach similarly works fine >> -- after the cancelation criteria occurs, no new splits are created, and >> existing splits dispense no more elements. The computation similarly >> quiesces after elements currently being processed are completed, >> possibly along with any up-tree merging to combine results. >> >> It is the parallel ordered case that is tricky. Supposing we partition >> a stream into >> (a1,a2,a3), (a4,a5,a6) >> >> And suppose further we happen to be processing a5 when the bell goes >> off. Do we want to wait for all a_i, i<5, to finish before letting the >> computation quiesce? >> >> My gut says: for the things we intend to cancel, most of them will be >> order-insensitive anyway. Things like: >> >> - Find the best possible move after thinking for 5 seconds >> - Find the first solution that is better than X >> - Gather solutions until we have 100 of them >> >> I believe the key use case for cancelation here will be when we are >> chewing on potentially infinite streams of events (probably backed by >> IO) where we want to chew until we're asked to shut down, and want to >> get as much parallelism as we can cheaply. Which suggests to me the >> intersection between order-sensitive stream pipelines and cancelable >> stream pipelines is going to be pretty small indeed. >> >> Anyone want to add to this model of use cases for cancelation? >> From forax at univ-mlv.fr Mon Dec 31 08:59:54 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 31 Dec 2012 17:59:54 +0100 Subject: Cancelation -- use cases In-Reply-To: <50E1C176.3030003@oracle.com> References: <50DDC6B3.8050805@oracle.com> <50E10C45.5000508@oracle.com> <50E1C176.3030003@oracle.com> Message-ID: <50E1C48A.7050101@univ-mlv.fr> On 12/31/2012 05:46 PM, Brian Goetz wrote: >> BTW, just noticed you can't get the lines of >> a file as a Stream. > > We have this: > > interface Reader { > Stream lines() > } > Having it on Reader instead of Files means that the user has to close the stream himself. R?mi From forax at univ-mlv.fr Mon Dec 31 09:04:41 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 31 Dec 2012 18:04:41 +0100 Subject: Streams -- philosophy In-Reply-To: <50E1BAE1.3000202@oracle.com> References: <50E1BAE1.3000202@oracle.com> Message-ID: <50E1C5A9.5010604@univ-mlv.fr> On 12/31/2012 05:18 PM, Brian Goetz wrote: > 6. Have a path to an open system. Anyone can create a parallel Stream > by creating a Spliterator for their data structure (or, by providing > an Iterator, and letting us split that, albeit less efficiently.) > That's a good start. > > The Stream API is designed around an extensible set of intermediate > and terminal operations, which uses an "SPI" to define IntermediateOp, > StatefulOp, and TerminalOp. While we do not plan to expose this SPI > in 8 (purely a triage decision; its not ready to stamp into concrete), > we want to expose it as soon as practical. The internal > pipeline(XxxOp) methods would be exposed, and users who create new ops > could integrate them into existing pipelines like: > > list.stream() > .filter(...) > .pipeline(dropEverySecondElement()) > .map(...) > ... > > The "pipeline" method is the escape hatch to add new stages into the > pipeline (this is actually how all ops are implemented internally > now.) Users would then be free to create new intermediate or terminal > operations and thread them into pipelines with pipeline(op). On that, I think Stream should not have the method iterator() and splitterator() public, but Streams should have two static methods to expose an Iterator and a Spliterator. just to not shut down the idea to find a better abstraction between the release of 9. R?mi From brian.goetz at oracle.com Mon Dec 31 09:09:34 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 12:09:34 -0500 Subject: Streams -- philosophy In-Reply-To: <50E1C5A9.5010604@univ-mlv.fr> References: <50E1BAE1.3000202@oracle.com> <50E1C5A9.5010604@univ-mlv.fr> Message-ID: <50E1C6CE.1010406@oracle.com> > On that, I think Stream should not have the method iterator() and > splitterator() public, > but Streams should have two static methods to expose an Iterator and a > Spliterator. > just to not shut down the idea to find a better abstraction between the > release of 9. I presume you mean something like: class Streams { public static Iterator iterator(Stream) public static Spliterator spliterator(Stream) But, static methods would have to work in terms of the public methods on Stream, so that they could work on any stream, not just our own. Meaning there must be some public way to get at the underlying data in the Stream interface. So I think it boils down to the same thing? From brian.goetz at oracle.com Mon Dec 31 09:26:28 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 12:26:28 -0500 Subject: Cancelation -- use cases In-Reply-To: <50E1C3C6.8030000@univ-mlv.fr> References: <50DDC6B3.8050805@oracle.com> <50E10C45.5000508@oracle.com> <50E1C3C6.8030000@univ-mlv.fr> Message-ID: <50E1CAC4.3000108@oracle.com> Yes, there's a big difference, and it has to do with the interplay of spatial (encounter) order and temporal (arrival) order in the parallel case. Earlier attempts to define a general while/until on Stream ran aground over the "what about the ordered case" problem. Moving the problem to a more constrained environment -- on top of an intrinsically unordered terminal operation -- sidesteps these tricky problems, which is why forEachUntil() is considerably simpler than a more general until(). In the sequential case, the two orders coincide and the "until" formulation is trivial and obvious. Similarly, in the unordered parallel case (e.g., hashSet.parallel()...) we are not constrained to do anything with the encounter order, in which case again the two collapse to be the same thing. The tricky case is the (common) one where there is a defined encounter order, such as when the source is a List, array, Queue, generator, or SortedSet. In such cases, processing elements out of order, or skipping elements, often gives the wrong answer, such as when you do a reduce with an associative reducer, so simply stopping computation when the music stops may give a wrong result, which we want to avoid. (Reductions are carefully arranged to provide parallelism while preserving correctness for associative operations.) People are going to expect that "until" is defined relative to the encounter order (as in, all the elements from zero up until the element I'm processing when the bell goes off.) But at root, this is a fundamentally serial notion of until-ness. We experimented with finding weird semantics for this ("identify the indexes of all elements being processed when the bell goes off, take the max of those, and allow all prior elements in the encounter order to finish before completing.") These semantics are complicated, hard to implement efficiently (look how bad parallel limit is now -- bad enough I'm still considering dropping limit entirely, and this is worse), and would likely result in poor responsiveness to cancelation. Having extrapolated down this road, I see this leading to no cancelation facility at all, making lots of otherwise viable formulations with infinite streams impractical. By moving the until onto the forEach operation, which explicitly does not care about encounter order, we are solving a simpler problem, freeing ourselves of the thorny issues surrounding encounter order and still have a cancelation mechanism that is useful in a lot of cases. The semantics are much simpler, as is the resulting implementation (~50 lines of new code.) On 12/31/2012 11:56 AM, Remi Forax wrote: > I've trouble to understood the difference with: > Streams.iterate(from, i -> i + 1) // sequential > .filter(i -> isPrime(i)) > .until(() -> System.currentTimeMillis() >= > start+num)). > .forEachUntil(i -> { > chm.put(i, true); > }; > > R?mi > > On 12/31/2012 04:53 AM, Brian Goetz wrote: >> Here's a lower-complexity version of cancel, that still satisfies (in >> series or in parallel) use cases like the following: >> >> > - Find the best possible move after thinking for 5 seconds >> > - Find the first solution that is better than X >> > - Gather solutions until we have 100 of them >> >> without bringing in the complexity or time/space overhead of dealing >> with encounter order. >> >> Since the forEach() operation works exclusively on the basis of >> temporal/arrival order rather than spatial/encounter order (elements >> are passed to the lambda in whatever order they are available, in >> whatever thread they are available), we could make a canceling variant >> of forEach: >> >> .forEachUntil(Block sink, BooleanSupplier until) >> >> Here, there is no confusion about what happens in the ordered case, no >> need to buffer elements, etc. Elements flow into the block until the >> termination condition transpires, at which point there are no more >> splits and existing splits dispense no more elements. >> >> I implemented this (it was trivial) and wrote a simple test program to >> calculate primes sequentially and in parallel, counting how many could >> be calculated in a fixed amount of time, starting from an infinite >> generator and filtering out composites: >> >> Streams.iterate(from, i -> i + 1) // sequential >> .filter(i -> isPrime(i)) >> .forEachUntil(i -> { >> chm.put(i, true); >> }, () -> System.currentTimeMillis() >= start+num); >> >> vs >> >> Streams.iterate(from, i -> i+1) // parallel >> .parallel() >> .filter(i -> isPrime(i)) >> .forEachUntil(i -> { >> chm.put(i, true); >> }, () -> System.currentTimeMillis() >= start+num); >> >> On a 4-core Q6600 system, in a fixed amount of time, the parallel >> version gathered ~3x as many primes. >> >> In terms of being able to perform useful computations on infinite >> streams, this seems a pretty attractive price-performer; lower spec >> and implementation complexity, and covers many of the use cases which >> would otherwise be impractical to attack with the stream approach. >> >> >> >> On 12/28/2012 11:20 AM, Brian Goetz wrote: >>> I've been working through some alternatives for cancellation support in >>> infinite streams. Looking to gather some use case background to help >>> evaluate the alternatives. >>> >>> In the serial case, the "gate" approach works fine -- after some >>> criteria transpires, stop sending elements downstream. The pipeline >>> flushes the elements it has, and completes early. >>> >>> In the parallel unordered case, the gate approach similarly works fine >>> -- after the cancelation criteria occurs, no new splits are created, and >>> existing splits dispense no more elements. The computation similarly >>> quiesces after elements currently being processed are completed, >>> possibly along with any up-tree merging to combine results. >>> >>> It is the parallel ordered case that is tricky. Supposing we partition >>> a stream into >>> (a1,a2,a3), (a4,a5,a6) >>> >>> And suppose further we happen to be processing a5 when the bell goes >>> off. Do we want to wait for all a_i, i<5, to finish before letting the >>> computation quiesce? >>> >>> My gut says: for the things we intend to cancel, most of them will be >>> order-insensitive anyway. Things like: >>> >>> - Find the best possible move after thinking for 5 seconds >>> - Find the first solution that is better than X >>> - Gather solutions until we have 100 of them >>> >>> I believe the key use case for cancelation here will be when we are >>> chewing on potentially infinite streams of events (probably backed by >>> IO) where we want to chew until we're asked to shut down, and want to >>> get as much parallelism as we can cheaply. Which suggests to me the >>> intersection between order-sensitive stream pipelines and cancelable >>> stream pipelines is going to be pretty small indeed. >>> >>> Anyone want to add to this model of use cases for cancelation? >>> > From sam at sampullara.com Mon Dec 31 09:31:19 2012 From: sam at sampullara.com (Sam Pullara) Date: Mon, 31 Dec 2012 09:31:19 -0800 Subject: Cancelation -- use cases In-Reply-To: <50E1C3C6.8030000@univ-mlv.fr> References: <50DDC6B3.8050805@oracle.com> <50E10C45.5000508@oracle.com> <50E1C3C6.8030000@univ-mlv.fr> Message-ID: I'm not a big fan of putting global conditions / side effects into the stream operations. Obviously you can do the same thing with my gate suggestion and can even add arbitrary cancellation: AtomicBoolean until = new AtomicBoolean(true); Streams.iterate(from, i -> i + 1) // sequential .filter(i -> isPrime(i)) .until(() -> until.get())). .forEach(i -> { chm.put(i, true); }; ... until.set(false). But without the close() propogating up the chain, I'm not sure how to recover resources from the Stream except through GC finalizers. We can't simply exhaust the stream as it could be of infinite length or very expensive to complete. On Mon, Dec 31, 2012 at 8:56 AM, Remi Forax wrote: > I've trouble to understood the difference with: > > Streams.iterate(from, i -> i + 1) // sequential > .filter(i -> isPrime(i)) > .until(() -> System.currentTimeMillis() >= start+num)). > .forEachUntil(i -> { > chm.put(i, true); > }; > > R?mi > > > On 12/31/2012 04:53 AM, Brian Goetz wrote: > >> Here's a lower-complexity version of cancel, that still satisfies (in >> series or in parallel) use cases like the following: >> >> > - Find the best possible move after thinking for 5 seconds >> > - Find the first solution that is better than X >> > - Gather solutions until we have 100 of them >> >> without bringing in the complexity or time/space overhead of dealing with >> encounter order. >> >> Since the forEach() operation works exclusively on the basis of >> temporal/arrival order rather than spatial/encounter order (elements are >> passed to the lambda in whatever order they are available, in whatever >> thread they are available), we could make a canceling variant of forEach: >> >> .forEachUntil(Block sink, BooleanSupplier until) >> >> Here, there is no confusion about what happens in the ordered case, no >> need to buffer elements, etc. Elements flow into the block until the >> termination condition transpires, at which point there are no more splits >> and existing splits dispense no more elements. >> >> I implemented this (it was trivial) and wrote a simple test program to >> calculate primes sequentially and in parallel, counting how many could be >> calculated in a fixed amount of time, starting from an infinite generator >> and filtering out composites: >> >> Streams.iterate(from, i -> i + 1) // sequential >> .filter(i -> isPrime(i)) >> .forEachUntil(i -> { >> chm.put(i, true); >> }, () -> System.currentTimeMillis() >= start+num); >> >> vs >> >> Streams.iterate(from, i -> i+1) // parallel >> .parallel() >> .filter(i -> isPrime(i)) >> .forEachUntil(i -> { >> chm.put(i, true); >> }, () -> System.currentTimeMillis() >= start+num); >> >> On a 4-core Q6600 system, in a fixed amount of time, the parallel version >> gathered ~3x as many primes. >> >> In terms of being able to perform useful computations on infinite >> streams, this seems a pretty attractive price-performer; lower spec and >> implementation complexity, and covers many of the use cases which would >> otherwise be impractical to attack with the stream approach. >> >> >> >> On 12/28/2012 11:20 AM, Brian Goetz wrote: >> >>> I've been working through some alternatives for cancellation support in >>> infinite streams. Looking to gather some use case background to help >>> evaluate the alternatives. >>> >>> In the serial case, the "gate" approach works fine -- after some >>> criteria transpires, stop sending elements downstream. The pipeline >>> flushes the elements it has, and completes early. >>> >>> In the parallel unordered case, the gate approach similarly works fine >>> -- after the cancelation criteria occurs, no new splits are created, and >>> existing splits dispense no more elements. The computation similarly >>> quiesces after elements currently being processed are completed, >>> possibly along with any up-tree merging to combine results. >>> >>> It is the parallel ordered case that is tricky. Supposing we partition >>> a stream into >>> (a1,a2,a3), (a4,a5,a6) >>> >>> And suppose further we happen to be processing a5 when the bell goes >>> off. Do we want to wait for all a_i, i<5, to finish before letting the >>> computation quiesce? >>> >>> My gut says: for the things we intend to cancel, most of them will be >>> order-insensitive anyway. Things like: >>> >>> - Find the best possible move after thinking for 5 seconds >>> - Find the first solution that is better than X >>> - Gather solutions until we have 100 of them >>> >>> I believe the key use case for cancelation here will be when we are >>> chewing on potentially infinite streams of events (probably backed by >>> IO) where we want to chew until we're asked to shut down, and want to >>> get as much parallelism as we can cheaply. Which suggests to me the >>> intersection between order-sensitive stream pipelines and cancelable >>> stream pipelines is going to be pretty small indeed. >>> >>> Anyone want to add to this model of use cases for cancelation? >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121231/f1f41386/attachment.html From forax at univ-mlv.fr Mon Dec 31 09:36:31 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 31 Dec 2012 18:36:31 +0100 Subject: Streams -- philosophy In-Reply-To: <50E1C6CE.1010406@oracle.com> References: <50E1BAE1.3000202@oracle.com> <50E1C5A9.5010604@univ-mlv.fr> <50E1C6CE.1010406@oracle.com> Message-ID: <50E1CD1F.7090503@univ-mlv.fr> On 12/31/2012 06:09 PM, Brian Goetz wrote: >> On that, I think Stream should not have the method iterator() and >> splitterator() public, >> but Streams should have two static methods to expose an Iterator and a >> Spliterator. >> just to not shut down the idea to find a better abstraction between the >> release of 9. > > I presume you mean something like: > > class Streams { > public static Iterator iterator(Stream) > public static Spliterator spliterator(Stream) > > But, static methods would have to work in terms of the public methods > on Stream, so that they could work on any stream, not just our own. > Meaning there must be some public way to get at the underlying data in > the Stream interface. So I think it boils down to the same thing? > classic SPI problem, each stream implementation register an interface and the code of Stream.iterator() check the upcoming Stream against each interface, if the Stream implements the interface, it call iterator() on it. R?mi From brian.goetz at oracle.com Mon Dec 31 10:08:45 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 13:08:45 -0500 Subject: Tabulators, reducers, etc In-Reply-To: <50DC69DF.3000001@oracle.com> References: <50DC69DF.3000001@oracle.com> Message-ID: <50E1D4AD.1060709@oracle.com> > One option might be: use "reduce" for the purely functional forms, use > accumulate/accumulateConcurrent for the others: > > T reduce(T zero, BinaryOperator reducer); > Optional reduce(BinaryOperator reducer); > U reduce(U zero, BiFunction accumulator, > BinaryOperator reducer); > > R accumulate(Accumulator reducer); > R accumulate(Supplier seedFactory, > BiBlock accumulator, > BiBlock reducer); > > R accumulateConcurrent(ConcurrentAccumulator tabulator); > > This would let us get rid of the Tabulator abstraction (it is > identical to MutableReducer; both get renamed to Accumulator). > Separately, with a small crowbar, we could simplify > ConcurrentAccumulator down to fitting into existing SAMs, and the > top-level abstraction could go away. While the concurrent use case is clearly the odd man out here -- suggesting more work is left to do on this -- the rest of it seems an improvement on what we have now. I would like to move forward with this while we continue to work out the correct set of canned accumulators and the correct way to surface concurrent accumulation. From forax at univ-mlv.fr Mon Dec 31 10:17:40 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 31 Dec 2012 19:17:40 +0100 Subject: Tabulators, reducers, etc In-Reply-To: <50E1D4AD.1060709@oracle.com> References: <50DC69DF.3000001@oracle.com> <50E1D4AD.1060709@oracle.com> Message-ID: <50E1D6C4.30306@univ-mlv.fr> On 12/31/2012 07:08 PM, Brian Goetz wrote: >> One option might be: use "reduce" for the purely functional forms, use >> accumulate/accumulateConcurrent for the others: >> >> T reduce(T zero, BinaryOperator reducer); >> Optional reduce(BinaryOperator reducer); >> U reduce(U zero, BiFunction accumulator, >> BinaryOperator reducer); >> >> R accumulate(Accumulator reducer); >> R accumulate(Supplier seedFactory, >> BiBlock accumulator, >> BiBlock reducer); >> >> R accumulateConcurrent(ConcurrentAccumulator tabulator); >> >> This would let us get rid of the Tabulator abstraction (it is >> identical to MutableReducer; both get renamed to Accumulator). >> Separately, with a small crowbar, we could simplify >> ConcurrentAccumulator down to fitting into existing SAMs, and the >> top-level abstraction could go away. > > While the concurrent use case is clearly the odd man out here -- > suggesting more work is left to do on this -- the rest of it seems an > improvement on what we have now. I would like to move forward with > this while we continue to work out the correct set of canned > accumulators and the correct way to surface concurrent accumulation. while do you want to surface concurrent accumulator given that we have forEach ?? R?mi From joe.darcy at oracle.com Mon Dec 31 10:23:49 2012 From: joe.darcy at oracle.com (Joe Darcy) Date: Mon, 31 Dec 2012 10:23:49 -0800 Subject: Primitive streams In-Reply-To: <50DF2FB3.6050405@univ-mlv.fr> References: <50DDDD0C.7030700@oracle.com> <50DF2FB3.6050405@univ-mlv.fr> Message-ID: <50E1D835.9080200@oracle.com> On 12/29/2012 10:00 AM, Remi Forax wrote: > On 12/28/2012 06:55 PM, Brian Goetz wrote: > [snip] >> >> >> 3. Numerics. Adding up doubles is not as simple as reducing with >> Double::sum (unless you don't care about accuracy.) Having methods >> for numeric sums gives us a place to put such intelligence; general >> reduce does not. > > I'm always afraid when someone try to put "intelligence" in a program. > We never have the same. > Just adding up floating-point numbers is a subtle and interesting topic of study. For example, there is a whole chapter on this matter in Nicholas Higham's "Accuracy and Stability of Numerical Algorithms." While Java is distinguished by its *predictable* floating-point semantics, just adding my a sequence of double numbers as "a + b" without any further processing or state can yield very poor numerical results. Therefore, I think it is more prudent if we could say something like "this summation of double numbers must have an error bound less than x" where x is in part a function of the number of input values. This is analogous to the quality of implementation requirements found in the java.lang.Math class which allow for alternative implementations of sin, cos, etc. -Joe From brian.goetz at oracle.com Mon Dec 31 10:40:19 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 13:40:19 -0500 Subject: Tabulators, reducers, etc In-Reply-To: <50E1D6C4.30306@univ-mlv.fr> References: <50DC69DF.3000001@oracle.com> <50E1D4AD.1060709@oracle.com> <50E1D6C4.30306@univ-mlv.fr> Message-ID: <50E1DC13.4060003@oracle.com> You are correct that this is mostly sugar around forEach. But we want to enable the same sort of flexible, composible operation for concurrent accumulations. If the user wants to do a three-level groupBy, writing the Block for this is ugly, and we want users to be able to reuse canned tabulations like groupBy, partition, etc. If that means a thin layer around forEach (just as now tabulate is a thin layer around reduce), that's fine. If it means we use forEach, but have combinators for constructing the appropriate blocks, that's fine too. That's exactly the discussion I'd like to have surrounding concurrent accumulation. On 12/31/2012 1:17 PM, Remi Forax wrote: > On 12/31/2012 07:08 PM, Brian Goetz wrote: >>> One option might be: use "reduce" for the purely functional forms, use >>> accumulate/accumulateConcurrent for the others: >>> >>> T reduce(T zero, BinaryOperator reducer); >>> Optional reduce(BinaryOperator reducer); >>> U reduce(U zero, BiFunction accumulator, >>> BinaryOperator reducer); >>> >>> R accumulate(Accumulator reducer); >>> R accumulate(Supplier seedFactory, >>> BiBlock accumulator, >>> BiBlock reducer); >>> >>> R accumulateConcurrent(ConcurrentAccumulator tabulator); >>> >>> This would let us get rid of the Tabulator abstraction (it is >>> identical to MutableReducer; both get renamed to Accumulator). >>> Separately, with a small crowbar, we could simplify >>> ConcurrentAccumulator down to fitting into existing SAMs, and the >>> top-level abstraction could go away. >> >> While the concurrent use case is clearly the odd man out here -- >> suggesting more work is left to do on this -- the rest of it seems an >> improvement on what we have now. I would like to move forward with >> this while we continue to work out the correct set of canned >> accumulators and the correct way to surface concurrent accumulation. > > while do you want to surface concurrent accumulator given that we have > forEach ?? > > R?mi > From brian.goetz at oracle.com Mon Dec 31 10:41:44 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 13:41:44 -0500 Subject: random streams Message-ID: <50E1DC68.8070405@oracle.com> On the list of requested stream sources is 'stream of random numbers'. Here's a one-line addition to Random: public IntStream ints() { return PrimitiveStreams.repeatedly(this::nextInt); } Certainly the implementation is straightforward enough (modulo renaming of PrimitiveStreams and repeatedly, which are not yet nailed down.) Any objections here? Clearly we'd want to support streams of ints, longs, and doubles, so just calling it stream() is wrong; should they be called random.ints(), or random.intStream()? From dl at cs.oswego.edu Mon Dec 31 10:54:02 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 31 Dec 2012 13:54:02 -0500 Subject: random streams In-Reply-To: <50E1DC68.8070405@oracle.com> References: <50E1DC68.8070405@oracle.com> Message-ID: <50E1DF4A.4000406@cs.oswego.edu> On 12/31/12 13:41, Brian Goetz wrote: > On the list of requested stream sources is 'stream of random numbers'. > > Here's a one-line addition to Random: > > public IntStream ints() { > return PrimitiveStreams.repeatedly(this::nextInt); > } > > Certainly the implementation is straightforward enough (modulo renaming of > PrimitiveStreams and repeatedly, which are not yet nailed down.) This is not so straightforward under parallel operations. This is a surprisingly deep topic with a lot of technical papers etc. As a first pass, you'd just use ThreadLocalRandom() as sources (to avoid horrible update contention), and make no promises about the aggregate randomness across parallel operations. However, people will come to expect that if you start off computations with a common seed, then you get both replicability and independence, which is not easy to deliver. As a start, you'd need a better generator than the one in Random (which is and must be the same algorithm used in ThreadLocalRandom). -Doug From dl at cs.oswego.edu Mon Dec 31 11:14:26 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 31 Dec 2012 14:14:26 -0500 Subject: Tabulators, reducers, etc In-Reply-To: <50E1DC13.4060003@oracle.com> References: <50DC69DF.3000001@oracle.com> <50E1D4AD.1060709@oracle.com> <50E1D6C4.30306@univ-mlv.fr> <50E1DC13.4060003@oracle.com> Message-ID: <50E1E412.6040606@cs.oswego.edu> On 12/31/12 13:40, Brian Goetz wrote: > You are correct that this is mostly sugar around forEach. But we want to enable > the same sort of flexible, composible operation for concurrent accumulations. > If the user wants to do a three-level groupBy, writing the Block for this is > ugly, and we want users to be able to reuse canned tabulations like groupBy, > partition, etc. > > If that means a thin layer around forEach (just as now tabulate is a thin layer > around reduce), that's fine. If it means we use forEach, but have combinators > for constructing the appropriate blocks, that's fine too. That's exactly the > discussion I'd like to have surrounding concurrent accumulation. On the off chance that anyone has forgotten :-) my take is still that whatever this ends up looking like, the best-practices advice will be to use explicit par/seq forEach with explicitly concurrent (or not) concrete destinations for all mutative updates. -Doug > > > On 12/31/2012 1:17 PM, Remi Forax wrote: >> On 12/31/2012 07:08 PM, Brian Goetz wrote: >>>> One option might be: use "reduce" for the purely functional forms, use >>>> accumulate/accumulateConcurrent for the others: >>>> >>>> T reduce(T zero, BinaryOperator reducer); >>>> Optional reduce(BinaryOperator reducer); >>>> U reduce(U zero, BiFunction accumulator, >>>> BinaryOperator reducer); >>>> >>>> R accumulate(Accumulator reducer); >>>> R accumulate(Supplier seedFactory, >>>> BiBlock accumulator, >>>> BiBlock reducer); >>>> >>>> R accumulateConcurrent(ConcurrentAccumulator tabulator); >>>> >>>> This would let us get rid of the Tabulator abstraction (it is >>>> identical to MutableReducer; both get renamed to Accumulator). >>>> Separately, with a small crowbar, we could simplify >>>> ConcurrentAccumulator down to fitting into existing SAMs, and the >>>> top-level abstraction could go away. >>> >>> While the concurrent use case is clearly the odd man out here -- >>> suggesting more work is left to do on this -- the rest of it seems an >>> improvement on what we have now. I would like to move forward with >>> this while we continue to work out the correct set of canned >>> accumulators and the correct way to surface concurrent accumulation. >> >> while do you want to surface concurrent accumulator given that we have >> forEach ?? >> >> R?mi >> > From brian.goetz at oracle.com Mon Dec 31 11:18:00 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 14:18:00 -0500 Subject: random streams In-Reply-To: <50E1DF4A.4000406@cs.oswego.edu> References: <50E1DC68.8070405@oracle.com> <50E1DF4A.4000406@cs.oswego.edu> Message-ID: <50E1E4E8.9060202@oracle.com> Sharing random number generators is indeed a tricky thing. But, this doesn't change the status quo, it just wraps it differently. Doing: rng.ints().parallel().forEach(i -> { ... i ... }) is no different from submitting code like { ... i = rng.nextInt(); i ... }) to an FJP; you'll get exactly the same interleavings. ThreadLocalRandom can override this to implement: public IntStream ints() { return PrimitiveStreams.repeatedly( () -> TLR.current().nextInt()); } On 12/31/2012 1:54 PM, Doug Lea wrote: > On 12/31/12 13:41, Brian Goetz wrote: >> On the list of requested stream sources is 'stream of random numbers'. >> >> Here's a one-line addition to Random: >> >> public IntStream ints() { >> return PrimitiveStreams.repeatedly(this::nextInt); >> } >> >> Certainly the implementation is straightforward enough (modulo >> renaming of >> PrimitiveStreams and repeatedly, which are not yet nailed down.) > > This is not so straightforward under parallel operations. > This is a surprisingly deep topic with a lot of technical papers etc. > As a first pass, you'd just use ThreadLocalRandom() as sources > (to avoid horrible update contention), and make no promises > about the aggregate randomness across parallel operations. > However, people will come to expect that if you start off > computations with a common seed, then you get both replicability > and independence, which is not easy to deliver. > As a start, you'd need a better generator than the > one in Random (which is and must be the same algorithm > used in ThreadLocalRandom). > > -Doug > > From dl at cs.oswego.edu Mon Dec 31 11:32:42 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 31 Dec 2012 14:32:42 -0500 Subject: random streams In-Reply-To: <50E1E4E8.9060202@oracle.com> References: <50E1DC68.8070405@oracle.com> <50E1DF4A.4000406@cs.oswego.edu> <50E1E4E8.9060202@oracle.com> Message-ID: <50E1E85A.50100@cs.oswego.edu> On 12/31/12 14:18, Brian Goetz wrote: > Sharing random number generators is indeed a tricky thing. But, this doesn't > change the status quo, it just wraps it differently. Doing: > > rng.ints().parallel().forEach(i -> { ... i ... }) > > is no different from submitting code like > > { ... i = rng.nextInt(); i ... }) > > to an FJP; you'll get exactly the same interleavings. You might recall that the reason for introducing TLR in JDK7 was that people using FJ/ParallelArray were reporting horribly contention problems. (Initially, thread-local randoms were available only as a utility method in FJ, then made stand-alone for JDK7.) We'd sorta rather not let this lesson be lost :-) > > ThreadLocalRandom can override this to implement: > > public IntStream ints() { > return PrimitiveStreams.repeatedly( > () -> TLR.current().nextInt()); > } > How about ONLY adding to TLR? -Doug > > > > > > On 12/31/2012 1:54 PM, Doug Lea wrote: >> On 12/31/12 13:41, Brian Goetz wrote: >>> On the list of requested stream sources is 'stream of random numbers'. >>> >>> Here's a one-line addition to Random: >>> >>> public IntStream ints() { >>> return PrimitiveStreams.repeatedly(this::nextInt); >>> } >>> >>> Certainly the implementation is straightforward enough (modulo >>> renaming of >>> PrimitiveStreams and repeatedly, which are not yet nailed down.) >> >> This is not so straightforward under parallel operations. >> This is a surprisingly deep topic with a lot of technical papers etc. >> As a first pass, you'd just use ThreadLocalRandom() as sources >> (to avoid horrible update contention), and make no promises >> about the aggregate randomness across parallel operations. >> However, people will come to expect that if you start off >> computations with a common seed, then you get both replicability >> and independence, which is not easy to deliver. >> As a start, you'd need a better generator than the >> one in Random (which is and must be the same algorithm >> used in ThreadLocalRandom). >> >> -Doug >> >> > From brian.goetz at oracle.com Mon Dec 31 11:49:47 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 14:49:47 -0500 Subject: random streams In-Reply-To: <50E1E85A.50100@cs.oswego.edu> References: <50E1DC68.8070405@oracle.com> <50E1DF4A.4000406@cs.oswego.edu> <50E1E4E8.9060202@oracle.com> <50E1E85A.50100@cs.oswego.edu> Message-ID: <50E1EC5B.3020201@oracle.com> > (Initially, thread-local randoms were available only as > a utility method in FJ, then made stand-alone for JDK7.) > We'd sorta rather not let this lesson be lost :-) Right. Of course, we have this problem today with Random.nextInt. Is your argument that Random.ints() will be such an attractive nuisance that the situation is going to be far worse than with the existing Random/ThreadLocalRandom divide? >> ThreadLocalRandom can override this to implement: >> >> public IntStream ints() { >> return PrimitiveStreams.repeatedly( >> () -> TLR.current().nextInt()); >> } > > How about ONLY adding to TLR? Several potential objections come to mind: - The above TLR formulation puts a ThreadLocal.get on the path to every random number. Is that an overhead we need to impose on the serial case just so people don't shoot themselves in the foot in the parallel case? - Discoverability. It will be much easier to find on Random. - Non-uniformity. Since SecureRandom and TLR extend Random, having it on the base class, where subclasses can provide a better implementation, seems more predictable and complete. These basically boil down to "seems kinda like we're hosing the folks who just want a serial stream of random numbers for test programs, just because someone might misuse it in the parallel case (in exactly the same way they can still misuse it without stream sugar.)" From brian.goetz at oracle.com Mon Dec 31 13:11:36 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 31 Dec 2012 16:11:36 -0500 Subject: Background: pipeline architecture Message-ID: <50E1FF88.1080702@oracle.com> Here's an attempt at putting most of the background on the architecture of pipelines in one place. A complete stream pipeline has several components: - A source - Source flags - Zero or more intermediate operations - one terminal operation The source is a Supplier. The reason for the indirection is so we can narrow the window where we require non-interference from (stream creation, end of terminal operation) down to (start of terminal operation, end of terminal operation.) So, for example, this case: list = new ... Stream s = list.stream(); // mutate list s.forEach(...) will see the list state as it was before the forEach, not at the list.stream() capture. This is easy to implement; the stream() method in ArrayList is: return Streams.stream( () -> Arrays.spliterator((E[]) elementData, 0, size), flags); By deferring evaluation of elementData and size, we can late-bind to the data. The source flags are a set of dynamic properties of the source. Defined flags include: - SIZED -- the size is known - ORDERED -- the source has a defined encounter order - SORTED -- the source is sorted by natural order - DISTINCT -- the source has distinct elements according to .equals We also encode serial-vs-parallel as a source flag, though this is mostly an implementation expediency. Intermediate operations produce a new stream, lazily; invoking an intermediate operation on a stream just sets up the new stream, but does not cause any computation nor does it cause the source to be consumed. The fundamental operation implemented by an intermediate operation is to wrap a "Sink" with a new Sink. So in a pipeline like list.filter(...).map(...).reduce(...) the terminal reduction operation will create a sink which reduces the values fed to it, and the map operation will wrap that with a mapping sink that transforms values as they pass through, and and the filter operation will wrap it with a filtering sink that only passes some elements through. Intermediate operations are divided into two kinds, stateful and stateless. The stateless operations are the well-behaved ones, which depend only on their inputs -- filter, map, mapMulti. The stateful ones include sorted, removeDuplicates, and limit. In a sequential pipeline, all intermediate ops can be jammed together for a single pass on the data. In a parallel pipeline, the same can be done only if all intermediate ops are stateless. Otherwise, we slice up the pipeline into segments ending in stateful ops, and execute them in segments: list.parallel().filter(...).sorted().map(...).reduce(...) ^-------------------^ ^------------------^ segment 1 segment 2 where the output of segment 1 is gathered into a conc-tree and then used as the source for segment 2. This segmentation is why Doug hates these operations; it complicates the parallel execution, obfuscates the cost model, and maps much more poorly to targets like GPUs. Each intermediate op also has a mask describing its flags. For each of the flags X described above, two bits are used to represent "injects X", "preserves X", or "clears X". For example, sorted() preserves size and injects sortedness and ordering. Filtering preserves ordering and distinctness but clears size. Mapping preserves size and but clears sortedness and distinctness. The flags for a pipeline are computed with boolean fu to take the source flags and fold in the effect of each op as the pipeline is built. There is also a SHORT_CIRCUIT flag which is only valid on ops (not source), and forces pull rather than push evaluation. Examples of short-circuit operations include limit(). Terminal operations cause evaluation of the pipeline; at the time a terminal operation is executed, the source is consumed (calling get() on the Supplier), a chain of sinks is created, parallel decomposition using spliterators is done, etc. Flags are used to optimize both sink chain construction and terminal execution. For example, if the upstream flags indicate sortedness, a sorted() operation is a no-op, reflected by the implementation of wrapSink(flags, sink) just returning the sink it was passed. Similarly, for terminal ops, orderedness can be used to relax constraints on the output, enabling more efficient computation if you know that the result need not respect encounter order. If the source is known to be sized, and all the ops are size-preserving, operations like toArray() can exploit size information to minimize allocation and copying. The set of operations are defined in Stream for reference streams, and IntStream for int streams; each of these has a (private) implementation class {Reference,Int}Pipeline who share a (private) base class AbstractPipeline. We represent a stream pipeline as a linked list of XxxPipeline objects, where each holds an op and links to its parent. Because of the shared base class, pipelines can cross shapes and still operations can be jammed together into a single pass, such as in: people.stream().filter(..).map(Person::getHeight).max(); ^Stream ^Stream ^IntStream and even though the "shape" of the data changes from reference to int we can create a single sink chain where we push Person objects in and (unboxed) ints come out.