From paul.sandoz at oracle.com Fri Feb 1 01:22:02 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 1 Feb 2013 10:22:02 +0100 Subject: Encounter order: take 2 In-Reply-To: References: <3843CA7B-CA6F-425A-990B-40D1250CF1AB@oracle.com> Message-ID: <27A3D1EC-361D-4D4E-8C73-A674EC6772A4@oracle.com> On Feb 1, 2013, at 12:27 AM, Mike Duigou wrote: >> >> An intermediate operation may clear encounter order so that the output stream and corresponding input stream to the next intermediate operation or terminal operation does not have encounter order. >> There are no such operations implemented. >> (Previously the unordered() operation cleared encounter order.) >> >> Otherwise an intermediate operation must preserve encounter order if required to do so (see next paragraphs). >> >> An intermediate operation may choose to apply a different algorithm if encounter order of input stream must be preserved or not. >> The distinct() operation will, when evaluating in parallel, use a ConcurrentHashMap to store unique elements if encounter order does not need to be preserved, otherwise if encounter order needs to be preserved a fold will be performed (equivalent of, in parallel, map each element to a singleton linked set then associatively reduce, left-to-right, the linked sets to one linked set). > > Without unordered() how is the CHM version accessed if the source is an ArrayList? > If the source has encounter order the distinct operation will choose whether to preserve encounter order or not as per clause b.2. i.e. the properties of the terminal operation are a factor. Implementation wise the op checks if the ORDERED flag is on the bit set of flags passed to it so it is not as complicated as it sounds. >> An intermediate operation must preserve encounter order of output stream if: >> >> a.1) the input stream to the intermediate operation has encounter order (either because the stream source has encounter order or because a previous intermediate operation injects encounter order); and >> a.2) the terminal operation preserves encounter order. >> >> An intermediate operation may not preserve encounter order of the output stream if: >> >> b.1) the input stream to the intermediate operation does not have encounter order (either because the stream source does not have encounter order or because a previous intermediate operation clears encounter order); or >> b.2) the terminal operation does not preserve encounter order *and* the intermediate operation is in a sequence of operations, to be computed, where the last operation in the sequence is the terminal operation and all operations in the sequence are computed in parallel. >> >> Rule b.2 above ensures that encounter order is preserved for the following pipeline on the sequential().forEach(): >> >> list.parallelStream().distinct().sequential().forEach() >> >> i.e. the distinct() intermediate operation will preserve the encounter order of the list stream source. > > I find this result surprising (and disappointing). Users are going to be surprised by the poor performance of using parallelStream in this case. > The sequential() op is currently implemented as a full barrier to ensure elements are reported sequentially downstream in the same thread that created the stream. The following will produce the same output: list.stream().distinct().forEach(...) list.parallelStream().distinct().sequential().forEach(...) Which i think conforms to the principle of least surprise. If performance is a concern and order is not then one should not use sequential e.g. do: list.parallelStream().distinct().forEach(e -> { synchronizied(this) { ... } } ) or a concurrent collect. Paul. From brian.goetz at oracle.com Fri Feb 1 07:01:14 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 01 Feb 2013 10:01:14 -0500 Subject: Javadoc conventions in the presence of default methods In-Reply-To: <510BC5BC.2050509@redhat.com> References: <510B2903.4070000@oracle.com> <510BC5BC.2050509@redhat.com> Message-ID: <510BD8BA.2050501@oracle.com> > I agree that the different parts should be separately inheritable. If a > subtype overrides a method though, I think only the main doc should be > inherited (since the implspec parts seem to be mainly for the benefit of > implementers, and afaict you cannot override a default method without > changing or dropping its implementation). OK good, since that's how it was proposed. > I'm not sure I'm really keen on separating specification from notes > though. That seems pretty specific to organizational preferences and > conventions. It seems to me that you'd want to inherit API notes along > with API spec always, and you'd want to keep implementation notes with > the implementation spec always, thus it just becomes a formatting > nicety. Put another way, we've gone this long without; why do we > suddenly need it now? It is not like the "need it" function has gone precipitously from zero to one. Realistically, its been creeping up slowly for years, and API maintainers have to go through all sorts of hoop-jumping to specify things, and there has been much effort spent fixing spec bugs (or worse, living with bad spec) that amount to conflating normative/informative or api/implementation spec. Adding default methods will increase the need in a nontrivial way. Until recently, there were only a few examples of optional methods in the JDK, mostly the mutative methods in Abstract{Collection,List,Map}. So there were a few of them, we could get by with a crutch. But as we'll be getting more, the crutches don't scale. People constantly make mistakes with the use of phrases like "this implementation" where its not clear what that actually means. So, its been a problem all along, been getting slowly worse with age, and we're about to dump some gas on an already-burning fire. From scolebourne at joda.org Fri Feb 1 07:36:57 2013 From: scolebourne at joda.org (Stephen Colebourne) Date: Fri, 1 Feb 2013 15:36:57 +0000 Subject: Javadoc conventions in the presence of default methods In-Reply-To: <510B2903.4070000@oracle.com> References: <510B2903.4070000@oracle.com> Message-ID: Thanks for the thread. I mostly agree. On 1 February 2013 02:31, Brian Goetz wrote: > We've tried this thread a few times without success, so let's try it again. Should this be beyond Project Lambda EG? > There are lots of things we might want to document about a method in an API. > Historically we've framed them as either being "specification" (e.g., > necessary postconditions) or "implementation notes" (e.g., hints that give > the user an idea what's going on under the hood.) But really, there are > four boxes (and we've been cramming them into two): > > { API, implementation } x { specification, notes } What about the difference betwen what implementors of "Java SE" need to do vs subclass writers? I'm guessing you intend both to be @implspec, but is that enough? thanks Stephen From david.lloyd at redhat.com Fri Feb 1 05:40:12 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Fri, 01 Feb 2013 07:40:12 -0600 Subject: Javadoc conventions in the presence of default methods In-Reply-To: <510B2903.4070000@oracle.com> References: <510B2903.4070000@oracle.com> Message-ID: <510BC5BC.2050509@redhat.com> I agree that the different parts should be separately inheritable. If a subtype overrides a method though, I think only the main doc should be inherited (since the implspec parts seem to be mainly for the benefit of implementers, and afaict you cannot override a default method without changing or dropping its implementation). I'm not sure I'm really keen on separating specification from notes though. That seems pretty specific to organizational preferences and conventions. It seems to me that you'd want to inherit API notes along with API spec always, and you'd want to keep implementation notes with the implementation spec always, thus it just becomes a formatting nicety. Put another way, we've gone this long without; why do we suddenly need it now? On 01/31/2013 08:31 PM, Brian Goetz wrote: > We've tried this thread a few times without success, so let's try it again. > > There have been a number of cases where its not obvious how to document > default methods. After some analysis, this appears to be just another > case where the complexity was present in Java from day 1, and default > methods simply bring it to the fore because the confusing cases are > expected to come up more often. The following applies equally well to > methods in abstract classes (or concrete classes) as to defaults. > > There are lots of things we might want to document about a method in an > API. Historically we've framed them as either being "specification" > (e.g., necessary postconditions) or "implementation notes" (e.g., hints > that give the user an idea what's going on under the hood.) But really, > there are four boxes (and we've been cramming them into two): > > { API, implementation } x { specification, notes } > > (We sometimes use the terms normative/informative to describe the > difference between specification/notes.) > > As background, here are some example uses for default methods which vary > in their "expected prevalence of overriding". I think the variety of > use cases here have contributed to the confusion on how to document > implementation characteristics. (Note that all of these have analogues > in abstract classes too, one can find examples in Abstract{List,Map,Set}.) > > 1. Optional methods. This is when the default implementation is barely > conformant, such as the following from Iterator: > > public default void remove() { > throw new UnsupportedOperationException("remove"); > } > > It adheres to its contract, because the contract is explicitly weak, but > any class that cares about removal will definitely want to override it. > > 2. Methods with *reasonable* defaults but which might well be > overridden by implementations that care enough. For example, again from > Iterator: > > default void forEach(Consumer consumer) { > while (hasNext()) > consumer.accept(next()); > } > > This implementation is perfectly fine for most implementations, but some > classes (e.g., ArrayList) might have the chance to do better, if their > maintainers are sufficiently motivated to do so. The new methods on Map > (e.g., putIfAbsent) are also in this bucket. > > 3. Methods where its pretty unlikely anyone will ever override them, > such as this method from Predicate: > > public default Predicate and(Predicate p) { > Objects.requireNonNull(p); > return (T t) -> test(t) && p.test(t); > } > > These are all common enough cases. The primary reason that the Javadoc > needs to provide some information about the implementation, separate > from the API specification, is so that those who would extend these > classes or interfaces can know which methods they need to / want to > override. It should be clear from the doc that anyone who implements > Iterator MUST implement remove() if they want removal to happen, CAN > override forEach if they think it will result in better performance, and > almost certainly don't need to override Predicate.and(). > > > The question is made more complicated by the prevalent use of the > ambiguous phrase "this implementation." We often use "this > implementation" to describe both normative and informative aspects of > the implementation, and readers are left to guess which. (Does "this > implementation" mean all versions of Oracle's JDK forever? The current > version in Oracle's JDK? All versions of all JDKs? The implementation > in a specific class? Could IBM's JDK throw a different exception from > UOE from the default of Iterator.remove()? What happens when the doc is > @inheritDoc'ed into a subclass that overrides the method? Etc. The > phrase is too vague to be useful, and this vagueness has been the > subject of many bug report.) > > I think one measure of success of this effort should be "can we replace > all uses of 'this implementation' with something that is more > informative and fits neatly within the model." > > > As said earlier, there are four boxes. Here are some descriptions of > what belongs in each box. > > 1. API specification. This is the one we know and love; a description > that applies equally to all valid implementations of the method, > including preconditions, postconditions, etc. > > 2. API notes. Commentary, rationale, or examples pertaining to the API. > > 3. Implementation specification. This is where we say what it means to > be a valid default implementation (or an overrideable implementation in > a class), such as "throws UOE." Similarly this is where we'd describe > what the default for putIfAbsent does. It is from this box that the > would-be-implementer gets enough information to make a sensible decision > as to whether or not to override. > > 4. Implementation notes. Informative notes about the implementation, > such as performance characteristics that are specific to the > implementation in this class in this JDK in this version, and might > change. These things are allowed to vary across platforms, vendors and > versions. > > Once we recognize that these are the four boxes, I think everything gets > simpler. > > > Strawman Proposal > ----------------- > > As a strawman proposal, here's one way to explicitly label the four > boxes: add three new Javadoc tags, @apinote, @implspec, and @implnote. > (The remaining box, API Spec, needs no new tag, since that's how Javadoc > is used already.) @impl{spec,note} can apply equally well to a concrete > method in a class or a default method in an interface. > > (Rule of engagement: bikeshedding the names will be interpreted as a > waiver to ever say anything about the model or the semantics. So you > may bikeshed, but it must be your last word on the topic.) > > /** > * ... API specifications ... > * > * @apinote > * ... API notes ... > * > * @implspec > * ... implementation specification ... > * > * @implnote > * ... implementation notes ... > * > * @param ... > * @return ... > */ > > Applying this to some existing Javadoc, take AbstractMap.putAll: > > Copies all of the mappings from the specified map to this map > (optional operation). The effect of this call is equivalent to > that of calling put(k, v) on this map once for each mapping from > key k to value v in the specified map. The behavior of this > operation is undefined if the specified map is modified while > the operation is in progress. > > This implementation iterates over the specified map's > entrySet() collection, and calls this map's put operation > once for each entry returned by the iteration. > > The first paragraph is API specification and the second is > implementation *specification*, as users expect the implementation in > AbstractMap, regardless of version or vendor, to behave this way. The > change here would be to replace "This implementation" with @implspec, > and the ambiguity over "this implementation" goes away. > > The doc for Iterator.remove could be: > > /** > * Removes from the underlying collection the last element returned by > * this iterator (optional operation). This method can be called only > * once per call to next(). The behavior of an iterator is unspecified > * if the underlying collection is modified while the iteration is in > * progress in any way other than by calling this method. > * > * @implspec > * The default implementation must throw UnsupportedOperationException. > * > * @implnote > * For purposes of efficiency, the same UnsupportedOperationException > * instance is always thrown. [*] > */ > > [*] We don't really intend to implement it this way; this is just an > example of an @implnote. > > > The doc for Map.putIfAbsent could be: > > /** > * If the specified key is not already associated with a value, > associates > * it with the given value. > * > * @implspec > * Th default behaves as if: > *
 {@code
>     * if (!map.containsKey(key))
>     *   return map.put(key, value);
>     * else
>     *   return map.get(key);
>     * } 
> * > * @implnote > * This default implementation is implemented essentially as described > * in the API note. This operation is not atomic. Atomicity, if desired, > * must be provided by a subclass implementation. > */ > > > Secondary: one can build on this to eliminate some common inheritance > anomalies by making these inheritable separately, where @inheritDoc is > interpreted as "inherit the stuff from the corresponding section." This > is backward compatible because these sections do not yet exist in old > docs. SO to inherit API spec and implementation spec, you would do: > > /** > * {@inheritDoc} > * @implspec > * {@inheritDoc} > * ... > */ -- - DML From dl at cs.oswego.edu Sun Feb 3 07:02:42 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 03 Feb 2013 10:02:42 -0500 Subject: Javadoc conventions in the presence of default methods In-Reply-To: <510B2903.4070000@oracle.com> References: <510B2903.4070000@oracle.com> Message-ID: <510E7C12.7020100@cs.oswego.edu> On 01/31/13 21:31, Brian Goetz wrote: > { API, implementation } x { specification, notes } > > /** > * ... API specifications ... > * > * @apinote > * ... API notes ... > * > * @implspec > * ... implementation specification ... > * > * @implnote > * ... implementation notes ... > * > * @param ... > * @return ... > */ This sounds about right. Even though 90% of future @implspecs will probably be for default methods, the need to use workarounds for lack of them regularly arises in other cases. I'm not completely sure about @apinote though. For example, something saying that implementations may have resource bound/capacity/default (without saying what it is), is part of a spec, not just a note, so I hope people don't use it as such. (Further, while there could then be an @implnote saying what that bound/etc value currently is, it is not always a great idea to do it when nothing else depends on the choice. Even saying what it is sometimes invites future problems when you need to change it.) Similarly for some performance-related issues. For example TreeMap should say as part of its spec that any implementation must have expected/amortized O(log n) get and put operations. It currently goes further and says that the implementation is based on red-black trees, but that should probably be in an @implnote. If we take these new categories seriously, we'll want to do a pass through most java.util (and related JDK) javadocs to carry this out consistently. And so on. So the only remaining role of @apinote is for misc rationales, warnings about potential future changes, and things like that. Which usually textually flow better in javadoc if just made part of the description. So I don't see myself using it much if ever. But since I can imagine uses here and there, I guess I have nothing against it. > > Secondary: one can build on this to eliminate some common inheritance anomalies > by making these inheritable separately, where @inheritDoc is interpreted as > "inherit the stuff from the corresponding section." This is backward compatible > because these sections do not yet exist in old docs. SO to inherit API spec and > implementation spec, you would do: > > /** > * {@inheritDoc} > * @implspec > * {@inheritDoc} > * ... > */ > Yes. We've had to do huge amounts of copy/past/hack of javadocs especially in java.util.concurrent to work around this problem. So, all-in-all: Yes, please do this. -Doug From brian.goetz at oracle.com Sun Feb 3 07:17:52 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 03 Feb 2013 10:17:52 -0500 Subject: Javadoc conventions in the presence of default methods In-Reply-To: <510E7C12.7020100@cs.oswego.edu> References: <510B2903.4070000@oracle.com> <510E7C12.7020100@cs.oswego.edu> Message-ID: <510E7FA0.8060309@oracle.com> "not completely sure" is a reasonable place to be with @apinote. Of the four, its definitely the least useful. But I think the "2x2" structure as proposed is a more sound basis than the "3" you'd get by taking it out. Your other notes amount to "people could get it wrong." Which is true, though there's plenty of room to get it wrong with the current scheme too. I have a hard time believing we'll make it worse. And it clearly addresses some long-standing gaps in our ability to document, which are about to get broader with the introduction of default methods. On 2/3/2013 10:02 AM, Doug Lea wrote: > On 01/31/13 21:31, Brian Goetz wrote: > >> { API, implementation } x { specification, notes } > >> >> /** >> * ... API specifications ... >> * >> * @apinote >> * ... API notes ... >> * >> * @implspec >> * ... implementation specification ... >> * >> * @implnote >> * ... implementation notes ... >> * >> * @param ... >> * @return ... >> */ > > This sounds about right. Even though 90% of future @implspecs will > probably be for default methods, the need to use workarounds > for lack of them regularly arises in other cases. > > I'm not completely sure about @apinote though. > For example, something saying that implementations > may have resource bound/capacity/default (without saying what > it is), is part of a spec, not just a note, so I > hope people don't use it as such. > (Further, while there could then be an @implnote > saying what that bound/etc value currently is, it is not > always a great idea to do it when nothing else depends > on the choice. Even saying what it is sometimes invites future > problems when you need to change it.) > > Similarly for some performance-related issues. For example > TreeMap should say as part of its spec that any implementation > must have expected/amortized O(log n) get and put operations. > It currently goes further and says that the implementation is > based on red-black trees, but that should probably be in > an @implnote. If we take these new categories > seriously, we'll want to do a pass through most java.util > (and related JDK) javadocs to carry this out consistently. > > And so on. So the only remaining role of @apinote is > for misc rationales, warnings about potential future > changes, and things like that. Which usually textually > flow better in javadoc if just made part of the description. > So I don't see myself using it much if ever. But since I > can imagine uses here and there, I guess I have nothing against > it. > >> >> Secondary: one can build on this to eliminate some common inheritance >> anomalies >> by making these inheritable separately, where @inheritDoc is >> interpreted as >> "inherit the stuff from the corresponding section." This is backward >> compatible >> because these sections do not yet exist in old docs. SO to inherit >> API spec and >> implementation spec, you would do: >> >> /** >> * {@inheritDoc} >> * @implspec >> * {@inheritDoc} >> * ... >> */ >> > > Yes. We've had to do huge amounts of copy/past/hack of > javadocs especially in java.util.concurrent to work around this > problem. > > So, all-in-all: Yes, please do this. > > -Doug > From dl at cs.oswego.edu Sun Feb 3 07:40:48 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 03 Feb 2013 10:40:48 -0500 Subject: Javadoc conventions in the presence of default methods In-Reply-To: <510E7FA0.8060309@oracle.com> References: <510B2903.4070000@oracle.com> <510E7C12.7020100@cs.oswego.edu> <510E7FA0.8060309@oracle.com> Message-ID: <510E8500.90907@cs.oswego.edu> On 02/03/13 10:17, Brian Goetz wrote: > Your other notes amount to "people could get it wrong." Which is true, though > there's plenty of room to get it wrong with the current scheme too. I have a > hard time believing we'll make it worse. Absolutely. I don't mean to imply anything other than that this is a big improvement. -Doug From brian.goetz at oracle.com Mon Feb 4 12:37:11 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 04 Feb 2013 15:37:11 -0500 Subject: explode (was: Stream method survey responses) In-Reply-To: <5101706E.3030601@oracle.com> References: <5101706E.3030601@oracle.com> Message-ID: <51101BF7.9070409@oracle.com> > From this, here's what I think is left to do: > - More work on explode needed > ... Circling back to this. Clearly explode() is not done. Let me try and capture all the relevant info in one place. Let's start with some background. Why do we want this method at all? Well, it's really useful! It's fairly common to do things like: Stream orders = ... Stream lineItems // explicit declaration for clarity = orders.explode(... order.getLineItems() ...) and it is often desirable to do then streamy things on the stream of line items. Those who have used flatMap in Scala get used to having it quite quickly, and would be very sad if it were taken away. Ask Don how many examples in his katas use it. (Doug will also point out that if you have flatMap, you don't really need map or filter -- see examples in CHM -- since both can be layered atop flatMap, modulo performance concerns.) (It does have the potential to put some stress on the system when an element can be mapped to very large collections, because it sucks you into the problem of nested parallelism. (This is the inverse of another problem we already have, which is when filter stages have very high selectivity, and we end up with a lot of splitting overhead for the number of elements that end up at the tail of the pipeline.) But when mapping an element to a small number of other elements, as is common in a lot of use cases, there is generally no problem here.) Scala has a method flatMap on whatever they'd call Stream, which takes a function T -> Stream and produces a Stream. More generally, this shape of flatMap applies (and is supported by higher-kinded generics) to many traits in the Scala library, where you have a Foo[A] and the flatMap method takes an A -> Foo[B] and produces a Foo[B]. (Our generics can't capture this.) This is the shape for flatMap that everyone really wants. But here, we run into unfortunate reality: this works great in functional languages with cheap structural types, but that's not Java. I took it as a key design goal for flatMap/mapMulti/explode that: if an element t maps to nothing, the implementation should do as close to zero work as possible. This rules out shaping flatMap as: Stream flatMap(Function>) because, if you don't already have the collection lying around, the lambdas for this are nasty to write (try writing one, and you'll see what I mean), inefficient to execute, and create work for the library to iterate the result. In the limit, where t maps to the empty collection, creating an iterating an empty collection for each element is nuts. (By contrast, in languages like Haskell, wrapping elements with lists is very cheap.) However, the above shape is desirable as a convenience in the case you already do have a collection lying around. So let's put it in the bin of "nice conveniences to also deliver when we solve the main problem." It also rules out shaping flatMap as: Stream flatMap(Function>) because that's even worse -- creating ad-hoc streams is even more expensive than creating collections. To simplify, imagine there are two use cases we have to satisfy: - map element to generator (general case) - map element to collection (convenience case) The other cases (to array, to stream) are similar enough to the collection case. To illustrate the general "generator" case, here's an example of a lambda (using the current API) that takes a Stream of String and produces a Stream of Integer values which are the characters of that stream: (sink, element) -> { for (int i=0; i { ArrayList list = new ArrayList<>(); for (int i=0; i { if (element.length() == 0) return Collections.emptyList(); ArrayList list = new ArrayList<>(); for (int i=0; iCollection case.) Erasure plays a role here too. Ideally, it would be nice to overload methods for flatMap(Function>) flatMap(Function Stream flatMap(MultiFunction mf) where MultiFunction was (T, Consumer) -> void. If users already had a Collection lying around, they had to iterate it themselves: (element, sink) -> { for (U u : findCollection(t)) sink.accept(u); } which isn't terrible but people didn't like it -- I think not because it was hard to read, but hard to figure out how to use flatMap at all. The current iteration provides a helper class with helper methods for handling collections, arrays, and streams, but you still have to wrap your head around why you're being passed two things before doing anything -- and I think its the "before doing anything" part that really messes people up. So, here's two alternatives that I hope may be better (and not run into problems with type inference). Alternative A: overloading on method names. // Map T -> Collection public StreamA explodeToCollection(Function> mapper); // Map T -> U[] public StreamA explodeToArray(Function mapper); // Generator case -- pass a T and a Consumer public StreamA explodeToConsumer(BiConsumer> mapper); // Alternate version of generator case -- with named SAM instead public StreamA altExplodeToConsumer(Exploder mapper); interface Exploder { void explode(T element, Consumer consumer); } Here, we have various explodeToXxx methods (naming is purely illustrative) that defeat the erasure problem. Users seeking the T->Collection version can use the appropriate versions with no problem. When said users discover that their performance sucks, they have motivation to learn to use the more efficient generator version. Usage examples: StreamA a1 = a.explodeToArray(i -> new Integer[] { i }); StreamA a2 = a.explodeToCollection(i -> Collections.singleton(i)); StreamA a3 = a.explodeToConsumer((i, sink) -> sink.accept(i)); Alternative B: overload on SAMs. This involves three SAMs: interface Exploder { void explode(T element, Consumer consumer); } interface MapperToCollection extends Function> { } interface MapperToArray extends Function { } And three overloaded explode() methods: public StreamB explode(MapperToCollection exploder); public StreamB explode(MapperToArray exploder); public StreamB explode(Exploder exploder); Usage examples: StreamB b1 = b.explode(i -> new Integer[] { i }); StreamB b2 = b.explode(i -> Collections.singleton(i)); StreamB b3 = b.explode((i, sink) -> sink.accept(i)); I think the second approach is pretty decent. Users can easily understand the first two versions and use them while wrapping their head around the third. From kevinb at google.com Mon Feb 4 12:52:17 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 4 Feb 2013 12:52:17 -0800 Subject: explode (was: Stream method survey responses) In-Reply-To: <51101BF7.9070409@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> Message-ID: Only a quick question first: On Mon, Feb 4, 2013 at 12:37 PM, Brian Goetz wrote: The original API had only: > > > Stream flatMap(MultiFunction mf) > > where MultiFunction was (T, Consumer) -> void. If users already had a > Collection lying around, they had to iterate it themselves: > > (element, sink) -> { > for (U u : findCollection(t)) > sink.accept(u); > } > Could that simply be (t, sink) -> findCollection(t).forEach(sink) ? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon Feb 4 12:53:53 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 04 Feb 2013 15:53:53 -0500 Subject: explode In-Reply-To: References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> Message-ID: <51101FE1.7020108@oracle.com> > (element, sink) -> { > for (U u : findCollection(t)) > sink.accept(u); > } > > Could that simply be (t, sink) -> findCollection(t).forEach(sink) ? Yes. From brian.goetz at oracle.com Mon Feb 4 12:58:35 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 04 Feb 2013 15:58:35 -0500 Subject: explode In-Reply-To: <51101BF7.9070409@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> Message-ID: <511020FB.5000509@oracle.com> > Alternative A: overloading on method names. > Alternative B: overload on SAMs. This involves three SAMs: To contrast these: - A has uglier method names, but fewer new types - B has prettier method names (and therefore prettier use site usage), but introduces more new ancillary types and puts more stress on type inference. Specifically, I am wondering how we're going to represent "explode Foo to ints" -- which is probably an important use case. From brian.goetz at oracle.com Tue Feb 5 08:53:42 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 05 Feb 2013 11:53:42 -0500 Subject: Collectors update In-Reply-To: <51085788.4080705@univ-mlv.fr> References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr> Message-ID: <51113916.1010809@oracle.com> >> 4. Rejigger Partition to return an array again, with an explicit >> lambda (which will likely be an array ctor ref) to make the array. >> Eliminated the silly Partition class. > > Please don't do that, it's pure evil. > public static Collector[]> > partitioningBy(Predicate predicate, IntFunction[]> > arraySupplier) { I've refactored this to make the partition collectors return Map. From forax at univ-mlv.fr Tue Feb 5 11:11:31 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 05 Feb 2013 20:11:31 +0100 Subject: Collectors update In-Reply-To: <51113916.1010809@oracle.com> References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr> <51113916.1010809@oracle.com> Message-ID: <51115963.2060209@univ-mlv.fr> On 02/05/2013 05:53 PM, Brian Goetz wrote: >>> 4. Rejigger Partition to return an array again, with an explicit >>> lambda (which will likely be an array ctor ref) to make the array. >>> Eliminated the silly Partition class. >> >> Please don't do that, it's pure evil. >> public static Collector[]> >> partitioningBy(Predicate predicate, IntFunction[]> >> arraySupplier) { > > I've refactored this to make the partition collectors return > Map. I think returning a boolean -> T (or Boolean -> T) is better because it's conceptually more lightweight than a Map. I expect to see more function instead of a Map returned as result of a method. Otherwise, like any other Map returned by the JDK, it should be serializable. R?mi From kevinb at google.com Tue Feb 5 12:20:33 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 5 Feb 2013 12:20:33 -0800 Subject: Collectors update In-Reply-To: <51115963.2060209@univ-mlv.fr> References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr> <51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr> Message-ID: On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax wrote: 4. Rejigger Partition to return an array again, with an explicit >>>> lambda (which will likely be an array ctor ref) to make the array. >>>> Eliminated the silly Partition class. >>>> >>> >>> Please don't do that, it's pure evil. >>> public static Collector[]> >>> partitioningBy(Predicate predicate, IntFunction[]> >>> arraySupplier) { >>> >> >> I've refactored this to make the partition collectors return Map> X>. >> > > I think returning a boolean -> T (or Boolean -> T) is better because it's > conceptually more lightweight than a Map. > I expect to see more function instead of a Map returned as result of a > method. > I'd have to disagree; I expect function objects to be little things I pass * in*, but I think it's more intuitive to expect a proper data structure back out. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Tue Feb 5 12:22:02 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 05 Feb 2013 15:22:02 -0500 Subject: Collectors update In-Reply-To: References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr> <51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr> Message-ID: <511169EA.1030109@oracle.com> I concur with Kevin. On 2/5/2013 3:20 PM, Kevin Bourrillion wrote: > On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax > wrote: > > 4. Rejigger Partition to return an array again, with an > explicit > lambda (which will likely be an array ctor ref) to make > the array. > Eliminated the silly Partition class. > > > Please don't do that, it's pure evil. > public static Collector[]> > partitioningBy(Predicate predicate, > IntFunction[]> > arraySupplier) { > > > I've refactored this to make the partition collectors return > Map. > > > I think returning a boolean -> T (or Boolean -> T) is better because > it's conceptually more lightweight than a Map. > I expect to see more function instead of a Map returned as result of > a method. > > > I'd have to disagree; I expect function objects to be little things I > pass /in/, but I think it's more intuitive to expect a proper data > structure back out. > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From forax at univ-mlv.fr Tue Feb 5 12:46:57 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 05 Feb 2013 21:46:57 +0100 Subject: Collectors update In-Reply-To: <511169EA.1030109@oracle.com> References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr> <51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr> <511169EA.1030109@oracle.com> Message-ID: <51116FC1.1030206@univ-mlv.fr> On 02/05/2013 09:22 PM, Brian Goetz wrote: > I concur with Kevin. We should remove Consumer.chain() in that case. R?mi > > On 2/5/2013 3:20 PM, Kevin Bourrillion wrote: >> On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax > > wrote: >> >> 4. Rejigger Partition to return an array again, with an >> explicit >> lambda (which will likely be an array ctor ref) to make >> the array. >> Eliminated the silly Partition class. >> >> >> Please don't do that, it's pure evil. >> public static Collector[]> >> partitioningBy(Predicate predicate, >> IntFunction[]> >> arraySupplier) { >> >> >> I've refactored this to make the partition collectors return >> Map. >> >> >> I think returning a boolean -> T (or Boolean -> T) is better because >> it's conceptually more lightweight than a Map. >> I expect to see more function instead of a Map returned as result of >> a method. >> >> >> I'd have to disagree; I expect function objects to be little things I >> pass /in/, but I think it's more intuitive to expect a proper data >> structure back out. >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> From brian.goetz at oracle.com Tue Feb 5 12:54:01 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 05 Feb 2013 15:54:01 -0500 Subject: Collectors update In-Reply-To: <51116FC1.1030206@univ-mlv.fr> References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr> <51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr> <511169EA.1030109@oracle.com> <51116FC1.1030206@univ-mlv.fr> Message-ID: <51117169.3090605@oracle.com> That's silly. We didn't say anything about "nothing should return a function." Kevin is completely right that collect() is a data-oriented operation and should return a real data structure. Consumer.chain() is a higher-order function; functions in, functions out -- no data involved. On 2/5/2013 3:46 PM, Remi Forax wrote: > On 02/05/2013 09:22 PM, Brian Goetz wrote: >> I concur with Kevin. > > We should remove Consumer.chain() in that case. > > R?mi > >> >> On 2/5/2013 3:20 PM, Kevin Bourrillion wrote: >>> On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax >> > wrote: >>> >>> 4. Rejigger Partition to return an array again, with an >>> explicit >>> lambda (which will likely be an array ctor ref) to make >>> the array. >>> Eliminated the silly Partition class. >>> >>> >>> Please don't do that, it's pure evil. >>> public static Collector[]> >>> partitioningBy(Predicate predicate, >>> IntFunction[]> >>> arraySupplier) { >>> >>> >>> I've refactored this to make the partition collectors return >>> Map. >>> >>> >>> I think returning a boolean -> T (or Boolean -> T) is better because >>> it's conceptually more lightweight than a Map. >>> I expect to see more function instead of a Map returned as result of >>> a method. >>> >>> >>> I'd have to disagree; I expect function objects to be little things I >>> pass /in/, but I think it's more intuitive to expect a proper data >>> structure back out. >>> >>> >>> -- >>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >>> > From forax at univ-mlv.fr Tue Feb 5 13:38:13 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 05 Feb 2013 22:38:13 +0100 Subject: Collectors update In-Reply-To: <51117169.3090605@oracle.com> References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr> <51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr> <511169EA.1030109@oracle.com> <51116FC1.1030206@univ-mlv.fr> <51117169.3090605@oracle.com> Message-ID: <51117BC5.8000501@univ-mlv.fr> On 02/05/2013 09:54 PM, Brian Goetz wrote: > That's silly. We didn't say anything about "nothing should return a > function." > > Kevin is completely right that collect() is a data-oriented operation > and should return a real data structure. > > Consumer.chain() is a higher-order function; functions in, functions > out -- no data involved. Why using Map is a "real data structure" ? I think I prefer to have a real type like Partition (as Don said). R?mi > > On 2/5/2013 3:46 PM, Remi Forax wrote: >> On 02/05/2013 09:22 PM, Brian Goetz wrote: >>> I concur with Kevin. >> >> We should remove Consumer.chain() in that case. >> >> R?mi >> >>> >>> On 2/5/2013 3:20 PM, Kevin Bourrillion wrote: >>>> On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax >>> > wrote: >>>> >>>> 4. Rejigger Partition to return an array again, >>>> with an >>>> explicit >>>> lambda (which will likely be an array ctor ref) to >>>> make >>>> the array. >>>> Eliminated the silly Partition class. >>>> >>>> >>>> Please don't do that, it's pure evil. >>>> public static Collector[]> >>>> partitioningBy(Predicate predicate, >>>> IntFunction[]> >>>> arraySupplier) { >>>> >>>> >>>> I've refactored this to make the partition collectors return >>>> Map. >>>> >>>> >>>> I think returning a boolean -> T (or Boolean -> T) is better >>>> because >>>> it's conceptually more lightweight than a Map. >>>> I expect to see more function instead of a Map returned as >>>> result of >>>> a method. >>>> >>>> >>>> I'd have to disagree; I expect function objects to be little things I >>>> pass /in/, but I think it's more intuitive to expect a proper data >>>> structure back out. >>>> >>>> >>>> -- >>>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >>>> >> From brian.goetz at oracle.com Wed Feb 6 14:12:15 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 06 Feb 2013 17:12:15 -0500 Subject: Collectors update redux Message-ID: <5112D53F.2080205@oracle.com> Did more tweaking with Collectors. Recall there are two basic forms of the collect method: The most basic one is the "on ramp", which doesn't require any understanding of Collector or the combinators therein; it is basically the mutable version of reduce. It looks like: collect(() -> R, (R,T) -> void, (R,R) -> void) The API shape is defined so that most invocations will work with method references: // To ArrayList collect(ArrayList::new, ArrayList::add, ArrayList::addAll) Note that this works in parallel too; we create list at the leaves with ::add, and merge them up the tree with ::addAll. // String concat collect(StringBuilder::new, StringBuilder::append, StringBuilder::append) // Turn an int stream to a BitSet with those bits set collect(BitSet::new, BitSet::set, BitSet::or) // String join with delimiter collect(() -> new StringJoiner(", "), StringJoiner::append, StringJoiner::append) Again, all these work in parallel. Digression: the various forms of reduce/etc form a ladder in terms of complexity: If you understand reduction, you can understand... ...reduce(T, BinaryOperator) If you understand the above + Optional, you can then understand... ...reduce(BinaryOperator) If you understand the above + "fold" (nonhomogeneous reduction), you can then understand... ...reduce(U, BiFunction accumulator, BinaryOperator); If you understand the above + "mutable fold" (inject), you can then understand... ...collect(Supplier, (R,T) -> void, (R,R) -> void) If you understand the above + "Collector", you can then understand... ...collect(Collector) This is all supported by the principle of commensurate effort; learn a little more, can do a little more. OK, exiting digression, moving to the end of the list, those that use "canned" Collectors. collect(Collector) collectUnordered(Collector) Collectors are basically a tuple of three lambdas and a boolean indicating whether the Collector can handle concurrent insertion: Collector = { () -> R, (R,T) -> void, (R,R) -> R, isConcurrent } Note there is a slight difference in the last argument, a BinaryOperator rather than a BiConsumer. The BinaryOperator form is more flexible (it can support appending two Lists into a tree representation without copying the elements, whereas the (R,R) -> void form can't.) This asymmetry is a rough edge, though in each case, the shape is "locally" optimal (in the three-arg version, the void form supports method refs better; in the Collector version, the result is more flexible, and that's where we need the flexibility.) But we could make them consistent at the cost of the above uses becoming more like: collect(StringBuilder::new, StringBuilder::append, (l, r) -> { l.append(r); return l; }) Overall I think the current API yields better client code at the cost of this slightly rough edge. The set of Collectors now includes: toCollection(Supplier) toList() toSet() toStringBuilder() toStringJoiner(delimiter) // mapping combinators (plus primitive specializations) mapping(T->U, Collector) // Single-level groupBy groupingBy(T->K) // groupBy with downstream Collector) groupingBy(T->K, Collector) // grouping plus reduce groupingReduce(T->K, BinaryOperator) // reduce only groupingReduce(T->K, T->U, BinaryOperator) // map+reduce // join (nee mappedTo) joiningWith(T -> U) // produces Map // partition partitioningBy(Predicate) partitioningBy(Predicate, Collector) partitioningReduce(Predicate, BinaryOperator) partitioningReduce(Predicate, T->U, BinaryOperator) // statistics (gathers sum, count, min, max, average) toLongStatistics() toDoubleStatistics() Plus, concurrent versions of most of these (which are suitable for unordered/contended/forEach-style execution.) Plus versions that let you offer explicit constructors for maps and collections. While these may seem like a lot, the implementations are highly compact -- all of these together, plus supporting machinery, fit in 500 LoC. These Collectors are designed around composibility. (It is vaguely frustrating that we even have to separate the "with downstream Collector" versions from the reducing versions.) So they each have a form where you can do some level of categorization and then use a downstream collector to do further computation. This is very powerful. Examples, again using the familiar problem domain of transactions: class Txn { Buyer buyer(); Seller seller(); String description(); int amount(); } Transactions by buyer: Map> m = txns.collect(groupingBy(Txn::buyer)); Highest-dollar transaction by buyer: Map m = txns.collect( groupingReduce(Txn::buyer, Comparators.greaterOf( Comparators.comparing(Txn::amount))); Here, comparing() takes the Txn -> amount function, and produces a Comparator; greaterOf(comparator) turns that Comparator into a BinaryOperator that corresponds to "max by comparator". We then reduce on that, yielding highest-dollar transaction per buyer. Alternately, if you want the number, not the transaction: Map m = txns.collect(groupingReduce(Txn::buyer, Txn::amount, Integer::max)); Transactions by buyer, seller: Map>> m = txns.collect(groupingBy(Txn::buyer, groupingBy(Txn::seller))); Transaction volume statistics by buyer, seller: Map> m = txns.collect(groupingBy(Txn::buyer, groupingBy(Txn::seller, mapping(Txn::amount, toLongStatistics()))); The statistics let you get at min, max, sum, count, and average from a single pass on the data (this trick taken from ParallelArray.) We can mix and match at various levels. For example: Transactions by buyer, partitioned int "large/small" groups: Predicate isLarge = t -> t.amount() > BIG; Map>> m = txns.collect(groupingBy(Txn::buyer, partitioningBy(isLarge))); Or, turning it around: Map>> m = txns.collect(partitioningBy(isLarge, groupingBy(Txn::buyer))); Because Collector is public, Kevin can write and publish Guava-multimap-bearing versions of these -- probably in about ten minutes. From brian.goetz at oracle.com Wed Feb 6 14:32:28 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 06 Feb 2013 17:32:28 -0500 Subject: explode In-Reply-To: <51101BF7.9070409@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> Message-ID: <5112D9FC.9010707@oracle.com> Guys, we need to close on the open Stream API items relatively soon. Maybe we're almost there on flatMap. Of the alternatives for flatMap below, I think while Alternative B is attractive from a client code perspective, I think Alternative A is less risky with respect to stressing the compiler (and also introduces fewer new types.) So, semi-concrete proposal: Stream flatMapToCollection(Function>) Stream flatMapToArray(Function) // do we even need this? Stream flatMap(Function>) Stream flatMap(FlatMapper) where interface FlatMapper { void explodeInto(T t, Consumer consumer); } with specializations for primitives: IntStream flatMap(FlatMapper.OfInt) ... etc We can then position flatMap as the "advanced" version, so from a "graduated learning" perspective, people will find fMTC first, if that meets their needs, great, and the Javadoc for fMTC can guide them to fM for the more advanced cases. On 2/4/2013 3:37 PM, Brian Goetz wrote: >> From this, here's what I think is left to do: >> - More work on explode needed > > ... > > Circling back to this. Clearly explode() is not done. Let me try and > capture all the relevant info in one place. > > Let's start with some background. Why do we want this method at all? > Well, it's really useful! It's fairly common to do things like: > > Stream orders = ... > Stream lineItems // explicit declaration for clarity > = orders.explode(... order.getLineItems() ...) > > and it is often desirable to do then streamy things on the stream of > line items. Those who have used flatMap in Scala get used to having it > quite quickly, and would be very sad if it were taken away. Ask Don how > many examples in his katas use it. (Doug will also point out that if > you have flatMap, you don't really need map or filter -- see examples in > CHM -- since both can be layered atop flatMap, modulo performance > concerns.) > > (It does have the potential to put some stress on the system when an > element can be mapped to very large collections, because it sucks you > into the problem of nested parallelism. (This is the inverse of another > problem we already have, which is when filter stages have very high > selectivity, and we end up with a lot of splitting overhead for the > number of elements that end up at the tail of the pipeline.) But when > mapping an element to a small number of other elements, as is common in > a lot of use cases, there is generally no problem here.) > > Scala has a method flatMap on whatever they'd call Stream, which > takes a function > > T -> Stream > > and produces a Stream. More generally, this shape of flatMap applies > (and is supported by higher-kinded generics) to many traits in the Scala > library, where you have a Foo[A] and the flatMap method takes an A -> > Foo[B] and produces a Foo[B]. (Our generics can't capture this.) > > This is the shape for flatMap that everyone really wants. But here, we > run into unfortunate reality: this works great in functional languages > with cheap structural types, but that's not Java. > > I took it as a key design goal for flatMap/mapMulti/explode that: if an > element t maps to nothing, the implementation should do as close to zero > work as possible. > > This rules out shaping flatMap as: > > Stream flatMap(Function>) > > because, if you don't already have the collection lying around, the > lambdas for this are nasty to write (try writing one, and you'll see > what I mean), inefficient to execute, and create work for the library to > iterate the result. In the limit, where t maps to the empty collection, > creating an iterating an empty collection for each element is nuts. (By > contrast, in languages like Haskell, wrapping elements with lists is > very cheap.) > > However, the above shape is desirable as a convenience in the case you > already do have a collection lying around. So let's put it in the bin > of "nice conveniences to also deliver when we solve the main problem." > > It also rules out shaping flatMap as: > > Stream flatMap(Function>) > > because that's even worse -- creating ad-hoc streams is even more > expensive than creating collections. > > To simplify, imagine there are two use cases we have to satisfy: > > - map element to generator (general case) > - map element to collection (convenience case) > > The other cases (to array, to stream) are similar enough to the > collection case. > > To illustrate the general "generator" case, here's an example of a > lambda (using the current API) that takes a Stream of String and > produces a Stream of Integer values which are the characters of that > stream: > > (sink, element) -> { > for (int i=0; i sink.send(element.charAt(i)); > } > > It's efficient (no input, no output) and pretty easy to see what's going > on. Would be nicer if we could spell "sink.send" as "yield", but oh > well. Here's how we'd have to write that if we didn't support the > generator case: > > (element) -> { > ArrayList list = new ArrayList<>(); > for (int i=0; i list.add(element.charAt(i)); > return list; > } > > Bigger and less efficient. And it gets uglier if we want to try and > optimize away the list creation in the empty case: > > (element) -> { > if (element.length() == 0) > return Collections.emptyList(); > ArrayList list = new ArrayList<>(); > for (int i=0; i list.add(element.charAt(i)); > return list; > } > > We're really starting to lose sight of what this lambda does. (Hopefully > this will put to bed the notion that all we need is the T->Collection > case.) > > Erasure plays a role here too. Ideally, it would be nice to overload > methods for > > flatMap(Function>) > flatMap(Function > but obviously we can't do that (directly). > > > The original API had only: > > Stream flatMap(MultiFunction mf) > > where MultiFunction was (T, Consumer) -> void. If users already had > a Collection lying around, they had to iterate it themselves: > > (element, sink) -> { > for (U u : findCollection(t)) > sink.accept(u); > } > > which isn't terrible but people didn't like it -- I think not because it > was hard to read, but hard to figure out how to use flatMap at all. > > The current iteration provides a helper class with helper methods for > handling collections, arrays, and streams, but you still have to wrap > your head around why you're being passed two things before doing > anything -- and I think its the "before doing anything" part that > really messes people up. > > > So, here's two alternatives that I hope may be better (and not run into > problems with type inference). > > Alternative A: overloading on method names. > > // Map T -> Collection > public StreamA explodeToCollection(Function> > mapper); > > // Map T -> U[] > public StreamA explodeToArray(Function mapper); > > // Generator case -- pass a T and a Consumer > public StreamA explodeToConsumer(BiConsumer> > mapper); > > // Alternate version of generator case -- with named SAM instead > public StreamA altExplodeToConsumer(Exploder mapper); > > interface Exploder { > void explode(T element, Consumer consumer); > } > > Here, we have various explodeToXxx methods (naming is purely > illustrative) that defeat the erasure problem. Users seeking the > T->Collection version can use the appropriate versions with no problem. > When said users discover that their performance sucks, they have > motivation to learn to use the more efficient generator version. > > Usage examples: > > StreamA a1 > = a.explodeToArray(i -> new Integer[] { i }); > StreamA a2 > = a.explodeToCollection(i -> Collections.singleton(i)); > StreamA a3 > = a.explodeToConsumer((i, sink) -> sink.accept(i)); > > > Alternative B: overload on SAMs. This involves three SAMs: > > interface Exploder { > void explode(T element, Consumer consumer); > } > > interface MapperToCollection > extends Function> { } > > interface MapperToArray extends Function { } > > And three overloaded explode() methods: > > public StreamB explode(MapperToCollection exploder); > > public StreamB explode(MapperToArray exploder); > > public StreamB explode(Exploder exploder); > > Usage examples: > > StreamB b1 = b.explode(i -> new Integer[] { i }); > StreamB b2 = b.explode(i -> Collections.singleton(i)); > StreamB b3 = b.explode((i, sink) -> sink.accept(i)); > > > I think the second approach is pretty decent. Users can easily > understand the first two versions and use them while wrapping their head > around the third. > > From forax at univ-mlv.fr Wed Feb 6 14:50:24 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 06 Feb 2013 23:50:24 +0100 Subject: explode In-Reply-To: <5112D9FC.9010707@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> Message-ID: <5112DE30.7030704@univ-mlv.fr> On 02/06/2013 11:32 PM, Brian Goetz wrote: > Guys, we need to close on the open Stream API items relatively soon. > Maybe we're almost there on flatMap. > > Of the alternatives for flatMap below, I think while Alternative B is > attractive from a client code perspective, I think Alternative A is > less risky with respect to stressing the compiler (and also introduces > fewer new types.) > > So, semi-concrete proposal: > > Stream flatMapToCollection(Function>) > Stream flatMapToArray(Function) // do we even need this? > Stream flatMap(Function>) > Stream flatMap(FlatMapper) What about consistency ? You said that we should not use Collection explicitly in the stream API hence we don't have toList(), toSet(), or groupBy() but collect(toList()), collect(toSet()) or collect(groupingBy) and at the same time, for flatMap which will be less used, you want to add flatMapToCollection, flatMapToArray. I think you should be at least consistent, so either we have an Exploder like we have a Collector, or we have several overloads for flatMap, groupBy and toList/toSet. > > where > > interface FlatMapper { > void explodeInto(T t, Consumer consumer); > } > > with specializations for primitives: > > IntStream flatMap(FlatMapper.OfInt) > ... etc > > We can then position flatMap as the "advanced" version, so from a > "graduated learning" perspective, people will find fMTC first, if that > meets their needs, great, and the Javadoc for fMTC can guide them to > fM for the more advanced cases. R?mi > > > > On 2/4/2013 3:37 PM, Brian Goetz wrote: >>> From this, here's what I think is left to do: >>> - More work on explode needed >> > ... >> >> Circling back to this. Clearly explode() is not done. Let me try and >> capture all the relevant info in one place. >> >> Let's start with some background. Why do we want this method at all? >> Well, it's really useful! It's fairly common to do things like: >> >> Stream orders = ... >> Stream lineItems // explicit declaration for clarity >> = orders.explode(... order.getLineItems() ...) >> >> and it is often desirable to do then streamy things on the stream of >> line items. Those who have used flatMap in Scala get used to having it >> quite quickly, and would be very sad if it were taken away. Ask Don how >> many examples in his katas use it. (Doug will also point out that if >> you have flatMap, you don't really need map or filter -- see examples in >> CHM -- since both can be layered atop flatMap, modulo performance >> concerns.) >> >> (It does have the potential to put some stress on the system when an >> element can be mapped to very large collections, because it sucks you >> into the problem of nested parallelism. (This is the inverse of another >> problem we already have, which is when filter stages have very high >> selectivity, and we end up with a lot of splitting overhead for the >> number of elements that end up at the tail of the pipeline.) But when >> mapping an element to a small number of other elements, as is common in >> a lot of use cases, there is generally no problem here.) >> >> Scala has a method flatMap on whatever they'd call Stream, which >> takes a function >> >> T -> Stream >> >> and produces a Stream. More generally, this shape of flatMap applies >> (and is supported by higher-kinded generics) to many traits in the Scala >> library, where you have a Foo[A] and the flatMap method takes an A -> >> Foo[B] and produces a Foo[B]. (Our generics can't capture this.) >> >> This is the shape for flatMap that everyone really wants. But here, we >> run into unfortunate reality: this works great in functional languages >> with cheap structural types, but that's not Java. >> >> I took it as a key design goal for flatMap/mapMulti/explode that: if an >> element t maps to nothing, the implementation should do as close to zero >> work as possible. >> >> This rules out shaping flatMap as: >> >> Stream flatMap(Function>) >> >> because, if you don't already have the collection lying around, the >> lambdas for this are nasty to write (try writing one, and you'll see >> what I mean), inefficient to execute, and create work for the library to >> iterate the result. In the limit, where t maps to the empty collection, >> creating an iterating an empty collection for each element is nuts. (By >> contrast, in languages like Haskell, wrapping elements with lists is >> very cheap.) >> >> However, the above shape is desirable as a convenience in the case you >> already do have a collection lying around. So let's put it in the bin >> of "nice conveniences to also deliver when we solve the main problem." >> >> It also rules out shaping flatMap as: >> >> Stream flatMap(Function>) >> >> because that's even worse -- creating ad-hoc streams is even more >> expensive than creating collections. >> >> To simplify, imagine there are two use cases we have to satisfy: >> >> - map element to generator (general case) >> - map element to collection (convenience case) >> >> The other cases (to array, to stream) are similar enough to the >> collection case. >> >> To illustrate the general "generator" case, here's an example of a >> lambda (using the current API) that takes a Stream of String and >> produces a Stream of Integer values which are the characters of that >> stream: >> >> (sink, element) -> { >> for (int i=0; i> sink.send(element.charAt(i)); >> } >> >> It's efficient (no input, no output) and pretty easy to see what's going >> on. Would be nicer if we could spell "sink.send" as "yield", but oh >> well. Here's how we'd have to write that if we didn't support the >> generator case: >> >> (element) -> { >> ArrayList list = new ArrayList<>(); >> for (int i=0; i> list.add(element.charAt(i)); >> return list; >> } >> >> Bigger and less efficient. And it gets uglier if we want to try and >> optimize away the list creation in the empty case: >> >> (element) -> { >> if (element.length() == 0) >> return Collections.emptyList(); >> ArrayList list = new ArrayList<>(); >> for (int i=0; i> list.add(element.charAt(i)); >> return list; >> } >> >> We're really starting to lose sight of what this lambda does. (Hopefully >> this will put to bed the notion that all we need is the T->Collection >> case.) >> >> Erasure plays a role here too. Ideally, it would be nice to overload >> methods for >> >> flatMap(Function>) >> flatMap(Function> >> but obviously we can't do that (directly). >> >> >> The original API had only: >> >> Stream flatMap(MultiFunction mf) >> >> where MultiFunction was (T, Consumer) -> void. If users already had >> a Collection lying around, they had to iterate it themselves: >> >> (element, sink) -> { >> for (U u : findCollection(t)) >> sink.accept(u); >> } >> >> which isn't terrible but people didn't like it -- I think not because it >> was hard to read, but hard to figure out how to use flatMap at all. >> >> The current iteration provides a helper class with helper methods for >> handling collections, arrays, and streams, but you still have to wrap >> your head around why you're being passed two things before doing >> anything -- and I think its the "before doing anything" part that >> really messes people up. >> >> >> So, here's two alternatives that I hope may be better (and not run into >> problems with type inference). >> >> Alternative A: overloading on method names. >> >> // Map T -> Collection >> public StreamA explodeToCollection(Function> >> mapper); >> >> // Map T -> U[] >> public StreamA explodeToArray(Function mapper); >> >> // Generator case -- pass a T and a Consumer >> public StreamA explodeToConsumer(BiConsumer> >> mapper); >> >> // Alternate version of generator case -- with named SAM instead >> public StreamA altExplodeToConsumer(Exploder mapper); >> >> interface Exploder { >> void explode(T element, Consumer consumer); >> } >> >> Here, we have various explodeToXxx methods (naming is purely >> illustrative) that defeat the erasure problem. Users seeking the >> T->Collection version can use the appropriate versions with no problem. >> When said users discover that their performance sucks, they have >> motivation to learn to use the more efficient generator version. >> >> Usage examples: >> >> StreamA a1 >> = a.explodeToArray(i -> new Integer[] { i }); >> StreamA a2 >> = a.explodeToCollection(i -> Collections.singleton(i)); >> StreamA a3 >> = a.explodeToConsumer((i, sink) -> sink.accept(i)); >> >> >> Alternative B: overload on SAMs. This involves three SAMs: >> >> interface Exploder { >> void explode(T element, Consumer consumer); >> } >> >> interface MapperToCollection >> extends Function> { } >> >> interface MapperToArray extends Function { } >> >> And three overloaded explode() methods: >> >> public StreamB explode(MapperToCollection exploder); >> >> public StreamB explode(MapperToArray exploder); >> >> public StreamB explode(Exploder exploder); >> >> Usage examples: >> >> StreamB b1 = b.explode(i -> new Integer[] { i }); >> StreamB b2 = b.explode(i -> Collections.singleton(i)); >> StreamB b3 = b.explode((i, sink) -> sink.accept(i)); >> >> >> I think the second approach is pretty decent. Users can easily >> understand the first two versions and use them while wrapping their head >> around the third. >> >> From brian.goetz at oracle.com Wed Feb 6 15:30:15 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 06 Feb 2013 18:30:15 -0500 Subject: explode In-Reply-To: <5112DE30.7030704@univ-mlv.fr> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> Message-ID: <5112E787.5090809@oracle.com> > You said that we should not use Collection explicitly in the stream API > hence we don't have toList(), toSet(), or groupBy() but > collect(toList()), collect(toSet()) or collect(groupingBy) > and at the same time, for flatMap which will be less used, you want to > add flatMapToCollection, flatMapToArray. Yes, any coupling to Collection is undesirable and has to be justified. We're currently in a nice place (zero uses of Collection in Stream) so it would be nice to stay there, and one is a lot worse than zero. But be careful that you try to turn consistency into a goal unto itself. For example, the use of Collections in Collectors is an ideal compromise; the important thing is they are out of the core interface which we expect every aggregate for the next 10+ years to implement, but are still available for easy use through standalone static helper methods like groupingBy. This is an ideal balance of giving users tools to do their job without tying Stream to Collection. > I think you should be at least consistent, so either we have an Exploder > like we have a Collector, > or we have several overloads for flatMap, groupBy and toList/toSet. Personally, I would (fairly strongly) prefer to have only: Stream flatMap(FlatMapper) and Stream flatMap(Function>) One can quite easily derive the Collection (and with slightly more work, array) cases from the first form (or the second form, with more runtime overhead): .flatMap((t, sink) -> getColl(t).forEach(sink)) .flatMap(t -> getColl(t).stream()) In fact, the first is what we originally had. But then people howled that (a) "I can't understand flatMap" and (b) "I think flatMap should take a Function>". In our early focus groups, people saw the base form of FlatMap and universally cried "WTF?" People can't understand it. After 100 people make the same comment, you start to get that its a pain point. So, the proposal I made today attempts to take into account that people are not yet ready to understand this form of flatMap, and attempts to compromise. But I'll happily retreat from that, and vote for just Stream flatMap(FlatMapper) Stream flatMap(Function>) It just seemed people weren't OK with that. (Though to be fair, we didn't always have the second form, and its addition might be enough to avoid the need for the Collection and array forms. It also allows reclaiming of the good name "flatMap", since there is actual mapping going on, and the generator form can piggyback on that.) So, +1 to Remi's implicit suggestion: Stream flatMap(FlatMapper) Stream flatMap(Function>) That's the new proposal. Will be carved in stone in 24h unless there is further discussion :) From forax at univ-mlv.fr Wed Feb 6 15:59:04 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 07 Feb 2013 00:59:04 +0100 Subject: explode In-Reply-To: <5112E787.5090809@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> Message-ID: <5112EE48.8060702@univ-mlv.fr> On 02/07/2013 12:30 AM, Brian Goetz wrote: >> You said that we should not use Collection explicitly in the stream API >> hence we don't have toList(), toSet(), or groupBy() but >> collect(toList()), collect(toSet()) or collect(groupingBy) >> and at the same time, for flatMap which will be less used, you want to >> add flatMapToCollection, flatMapToArray. > > Yes, any coupling to Collection is undesirable and has to be > justified. We're currently in a nice place (zero uses of Collection > in Stream) so it would be nice to stay there, and one is a lot worse > than zero. > > But be careful that you try to turn consistency into a goal unto > itself. For example, the use of Collections in Collectors is an ideal > compromise; the important thing is they are out of the core interface > which we expect every aggregate for the next 10+ years to implement, > but are still available for easy use through standalone static helper > methods like groupingBy. This is an ideal balance of giving users > tools to do their job without tying Stream to Collection. > >> I think you should be at least consistent, so either we have an Exploder >> like we have a Collector, >> or we have several overloads for flatMap, groupBy and toList/toSet. > > Personally, I would (fairly strongly) prefer to have only: > > Stream flatMap(FlatMapper) > > and > > Stream flatMap(Function>) > > One can quite easily derive the Collection (and with slightly more > work, array) cases from the first form (or the second form, with more > runtime overhead): > > .flatMap((t, sink) -> getColl(t).forEach(sink)) > .flatMap(t -> getColl(t).stream()) > > In fact, the first is what we originally had. But then people howled > that (a) "I can't understand flatMap" and (b) "I think flatMap should > take a Function>". In our early focus groups, people > saw the base form of FlatMap and universally cried "WTF?" People > can't understand it. After 100 people make the same comment, you > start to get that its a pain point. > > So, the proposal I made today attempts to take into account that > people are not yet ready to understand this form of flatMap, and > attempts to compromise. But I'll happily retreat from that, and vote > for just > > Stream flatMap(FlatMapper) > Stream flatMap(Function>) > > It just seemed people weren't OK with that. (Though to be fair, we > didn't always have the second form, and its addition might be enough > to avoid the need for the Collection and array forms. It also allows > reclaiming of the good name "flatMap", since there is actual mapping > going on, and the generator form can piggyback on that.) > > So, +1 to Remi's implicit suggestion: > > Stream flatMap(FlatMapper) > Stream flatMap(Function>) > > That's the new proposal. > > Will be carved in stone in 24h unless there is further discussion :) > I will vote on this if FlatMapper also defines static methods to see a function to a collection or to an array as a FlatMapper. interface FlatMapper { public void explodeInto(T t, Consumer consumer); public static FlatMapper explodeCollection(Function> function) { return (element, consumer) -> function.apply(element).forEach(consumer); } ... } so one can write: stream.flatMap(explodeCollection(Person::getFriends)) R?mi From kevinb at google.com Wed Feb 6 16:05:01 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 6 Feb 2013 16:05:01 -0800 Subject: explode In-Reply-To: <5112E787.5090809@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> Message-ID: On Wed, Feb 6, 2013 at 3:30 PM, Brian Goetz wrote: Stream flatMap(FlatMapper) > Stream flatMap(Function>) > To make sure I understand: would these two behave identically? Would they imaginably perform comparably? foos.stream().flatMap((t, consumer) -> t.somethingThatGivesAStream().forEach(consumer)) foos.stream().flatMap(t -> t.somethingThatGivesAStream()) Second question, why "FlatMapper.OfInt" here, but "IntSupplier" etc. elsewhere? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Wed Feb 6 16:06:37 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 07 Feb 2013 01:06:37 +0100 Subject: One import static to rule them all Message-ID: <5112F00D.4010506@univ-mlv.fr> I wonder if we should not create one artificial interface that extends Collector, FlatMapper, etc, i.e. every interfaces that declare static methods that can be used by the Stream API just because it will be easier to do an import static on this interface. interface StaticDefaults // better name needed extends Collector, FlatMapper { } otherwise, every Java projects will define its own one. cheers, R?mi From brian.goetz at oracle.com Wed Feb 6 16:11:42 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 06 Feb 2013 19:11:42 -0500 Subject: explode In-Reply-To: <5112EE48.8060702@univ-mlv.fr> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> <5112EE48.8060702@univ-mlv.fr> Message-ID: <5112F13E.2060807@oracle.com> > I will vote on this if FlatMapper also defines static methods to see a > function to a collection or to an array as a FlatMapper. Reasonable request. Where would they live? From brian.goetz at oracle.com Wed Feb 6 16:16:15 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 06 Feb 2013 19:16:15 -0500 Subject: explode In-Reply-To: References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> Message-ID: <5112F24F.8080308@oracle.com> > Stream flatMap(FlatMapper) > Stream flatMap(Function>) > > To make sure I understand: would these two behave identically? Would > they imaginably perform comparably? > > foos.stream().flatMap((t, consumer) -> > t.somethingThatGivesAStream().forEach(consumer)) > foos.stream().flatMap(t -> t.somethingThatGivesAStream()) Currently, they would behave identically. The T -> Stream form is not strictly necessary, since it can be written in terms of the other, but people will find it more convenient. One place where they might not behave identically in the future is that since streams are lazy, we might be able to make: integers.flatMap(i -> anInfiniteStream()).getFirst() actually terminate, whereas integers.flatMap((i,consumer) -> anInfiniteStream().forEach(consumer)).getFirst() will never terminate. So the laziness-preserving aspect of Stream is nice. The second would perform basically the same as the first. But neither would perform as well as actually generating the results directly into the consumer. > Second question, why "FlatMapper.OfInt" here, but "IntSupplier" etc. > elsewhere? Depends where FlatMapper lives. If FlatMapper is a general SAM, then it would go in j.u.f. and we'd definitely use the IntFlatMapper convention. However, I would lean towards making FlatMapper a type in j.u.s., in which case the naming convention more prevalent there is to use nested OfXxx classes. From kevinb at google.com Wed Feb 6 16:28:41 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 6 Feb 2013 16:28:41 -0800 Subject: One import static to rule them all In-Reply-To: <5112F00D.4010506@univ-mlv.fr> References: <5112F00D.4010506@univ-mlv.fr> Message-ID: I have been promised that this won't work -- that to invoke a static method on an interface one *must* refer to the exact interface it was defined on, not a subtype, not an instance. Can someone please confirm this is true? On Wed, Feb 6, 2013 at 4:06 PM, Remi Forax wrote: > I wonder if we should not create one artificial interface that extends > Collector, FlatMapper, etc, > i.e. every interfaces that declare static methods that can be used by the > Stream API > just because it will be easier to do an import static on this interface. > > interface StaticDefaults // better name needed > extends Collector, FlatMapper { > } > > otherwise, every Java projects will define its own one. > > cheers, > R?mi > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Wed Feb 6 16:36:46 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 07 Feb 2013 01:36:46 +0100 Subject: explode In-Reply-To: <5112F13E.2060807@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> <5112EE48.8060702@univ-mlv.fr> <5112F13E.2060807@oracle.com> Message-ID: <5112F71E.4090509@univ-mlv.fr> On 02/07/2013 01:11 AM, Brian Goetz wrote: >> I will vote on this if FlatMapper also defines static methods to see a >> function to a collection or to an array as a FlatMapper. > > Reasonable request. Where would they live? > I hope that most of the static methods should live in their corresponding interface, it's easier for devs to find them, it's easier for IDE to auto-complete them. R?mi From forax at univ-mlv.fr Wed Feb 6 16:35:29 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 07 Feb 2013 01:35:29 +0100 Subject: One import static to rule them all In-Reply-To: References: <5112F00D.4010506@univ-mlv.fr> Message-ID: <5112F6D1.70803@univ-mlv.fr> On 02/07/2013 01:28 AM, Kevin Bourrillion wrote: > I have been promised that this won't work -- that to invoke a static > method on an interface one /must/ refer to the exact interface it was > defined on, not a subtype, not an instance. Can someone please > confirm this is true? We talk about a call like Interface.staticM(), but we never talk about the static import explicitly. So if someone does a static import on an interface, you suggest that the compiler should see only the static methods declared in the interface and all the static fields declared or inherited from inherited interfaces (for backward compat.) ? R?mi > > > > On Wed, Feb 6, 2013 at 4:06 PM, Remi Forax > wrote: > > I wonder if we should not create one artificial interface that > extends Collector, FlatMapper, etc, > i.e. every interfaces that declare static methods that can be used > by the Stream API > just because it will be easier to do an import static on this > interface. > > interface StaticDefaults // better name needed > extends Collector, FlatMapper { > } > > otherwise, every Java projects will define its own one. > > cheers, > R?mi > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From kevinb at google.com Wed Feb 6 18:09:32 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 6 Feb 2013 18:09:32 -0800 Subject: One import static to rule them all In-Reply-To: <5112F6D1.70803@univ-mlv.fr> References: <5112F00D.4010506@univ-mlv.fr> <5112F6D1.70803@univ-mlv.fr> Message-ID: On Wed, Feb 6, 2013 at 4:35 PM, Remi Forax wrote: We talk about a call like Interface.staticM(), but we never talk about the > static import explicitly. > So if someone does a static import on an interface, you suggest that the > compiler should see only the static methods declared in the interface and > all the static fields declared or inherited from inherited interfaces (for > backward compat.) ? > I would certainly expect that what I can static-import, I could also invoke directly, yes. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From maurizio.cimadamore at oracle.com Thu Feb 7 02:29:45 2013 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 07 Feb 2013 10:29:45 +0000 Subject: explode In-Reply-To: <5112D9FC.9010707@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> Message-ID: <51138219.3030001@oracle.com> On 06/02/13 22:32, Brian Goetz wrote: > Guys, we need to close on the open Stream API items relatively soon. > Maybe we're almost there on flatMap. > > Of the alternatives for flatMap below, I think while Alternative B is > attractive from a client code perspective, I think Alternative A is > less risky with respect to stressing the compiler (and also introduces > fewer new types.) Did you find cases where B doesn't work with existing strategy? I think it won't stress the compiler more than what map does (actually it will do less so) - so, if we have are fine with supporting map, I don't see big problems with this, complexity-wise. Maurizio > > So, semi-concrete proposal: > > Stream flatMapToCollection(Function>) > Stream flatMapToArray(Function) // do we even need this? > Stream flatMap(Function>) > Stream flatMap(FlatMapper) > > where > > interface FlatMapper { > void explodeInto(T t, Consumer consumer); > } > > with specializations for primitives: > > IntStream flatMap(FlatMapper.OfInt) > ... etc > > We can then position flatMap as the "advanced" version, so from a > "graduated learning" perspective, people will find fMTC first, if that > meets their needs, great, and the Javadoc for fMTC can guide them to > fM for the more advanced cases. > > > > On 2/4/2013 3:37 PM, Brian Goetz wrote: >>> From this, here's what I think is left to do: >>> - More work on explode needed >> > ... >> >> Circling back to this. Clearly explode() is not done. Let me try and >> capture all the relevant info in one place. >> >> Let's start with some background. Why do we want this method at all? >> Well, it's really useful! It's fairly common to do things like: >> >> Stream orders = ... >> Stream lineItems // explicit declaration for clarity >> = orders.explode(... order.getLineItems() ...) >> >> and it is often desirable to do then streamy things on the stream of >> line items. Those who have used flatMap in Scala get used to having it >> quite quickly, and would be very sad if it were taken away. Ask Don how >> many examples in his katas use it. (Doug will also point out that if >> you have flatMap, you don't really need map or filter -- see examples in >> CHM -- since both can be layered atop flatMap, modulo performance >> concerns.) >> >> (It does have the potential to put some stress on the system when an >> element can be mapped to very large collections, because it sucks you >> into the problem of nested parallelism. (This is the inverse of another >> problem we already have, which is when filter stages have very high >> selectivity, and we end up with a lot of splitting overhead for the >> number of elements that end up at the tail of the pipeline.) But when >> mapping an element to a small number of other elements, as is common in >> a lot of use cases, there is generally no problem here.) >> >> Scala has a method flatMap on whatever they'd call Stream, which >> takes a function >> >> T -> Stream >> >> and produces a Stream. More generally, this shape of flatMap applies >> (and is supported by higher-kinded generics) to many traits in the Scala >> library, where you have a Foo[A] and the flatMap method takes an A -> >> Foo[B] and produces a Foo[B]. (Our generics can't capture this.) >> >> This is the shape for flatMap that everyone really wants. But here, we >> run into unfortunate reality: this works great in functional languages >> with cheap structural types, but that's not Java. >> >> I took it as a key design goal for flatMap/mapMulti/explode that: if an >> element t maps to nothing, the implementation should do as close to zero >> work as possible. >> >> This rules out shaping flatMap as: >> >> Stream flatMap(Function>) >> >> because, if you don't already have the collection lying around, the >> lambdas for this are nasty to write (try writing one, and you'll see >> what I mean), inefficient to execute, and create work for the library to >> iterate the result. In the limit, where t maps to the empty collection, >> creating an iterating an empty collection for each element is nuts. (By >> contrast, in languages like Haskell, wrapping elements with lists is >> very cheap.) >> >> However, the above shape is desirable as a convenience in the case you >> already do have a collection lying around. So let's put it in the bin >> of "nice conveniences to also deliver when we solve the main problem." >> >> It also rules out shaping flatMap as: >> >> Stream flatMap(Function>) >> >> because that's even worse -- creating ad-hoc streams is even more >> expensive than creating collections. >> >> To simplify, imagine there are two use cases we have to satisfy: >> >> - map element to generator (general case) >> - map element to collection (convenience case) >> >> The other cases (to array, to stream) are similar enough to the >> collection case. >> >> To illustrate the general "generator" case, here's an example of a >> lambda (using the current API) that takes a Stream of String and >> produces a Stream of Integer values which are the characters of that >> stream: >> >> (sink, element) -> { >> for (int i=0; i> sink.send(element.charAt(i)); >> } >> >> It's efficient (no input, no output) and pretty easy to see what's going >> on. Would be nicer if we could spell "sink.send" as "yield", but oh >> well. Here's how we'd have to write that if we didn't support the >> generator case: >> >> (element) -> { >> ArrayList list = new ArrayList<>(); >> for (int i=0; i> list.add(element.charAt(i)); >> return list; >> } >> >> Bigger and less efficient. And it gets uglier if we want to try and >> optimize away the list creation in the empty case: >> >> (element) -> { >> if (element.length() == 0) >> return Collections.emptyList(); >> ArrayList list = new ArrayList<>(); >> for (int i=0; i> list.add(element.charAt(i)); >> return list; >> } >> >> We're really starting to lose sight of what this lambda does. (Hopefully >> this will put to bed the notion that all we need is the T->Collection >> case.) >> >> Erasure plays a role here too. Ideally, it would be nice to overload >> methods for >> >> flatMap(Function>) >> flatMap(Function> >> but obviously we can't do that (directly). >> >> >> The original API had only: >> >> Stream flatMap(MultiFunction mf) >> >> where MultiFunction was (T, Consumer) -> void. If users already had >> a Collection lying around, they had to iterate it themselves: >> >> (element, sink) -> { >> for (U u : findCollection(t)) >> sink.accept(u); >> } >> >> which isn't terrible but people didn't like it -- I think not because it >> was hard to read, but hard to figure out how to use flatMap at all. >> >> The current iteration provides a helper class with helper methods for >> handling collections, arrays, and streams, but you still have to wrap >> your head around why you're being passed two things before doing >> anything -- and I think its the "before doing anything" part that >> really messes people up. >> >> >> So, here's two alternatives that I hope may be better (and not run into >> problems with type inference). >> >> Alternative A: overloading on method names. >> >> // Map T -> Collection >> public StreamA explodeToCollection(Function> >> mapper); >> >> // Map T -> U[] >> public StreamA explodeToArray(Function mapper); >> >> // Generator case -- pass a T and a Consumer >> public StreamA explodeToConsumer(BiConsumer> >> mapper); >> >> // Alternate version of generator case -- with named SAM instead >> public StreamA altExplodeToConsumer(Exploder mapper); >> >> interface Exploder { >> void explode(T element, Consumer consumer); >> } >> >> Here, we have various explodeToXxx methods (naming is purely >> illustrative) that defeat the erasure problem. Users seeking the >> T->Collection version can use the appropriate versions with no problem. >> When said users discover that their performance sucks, they have >> motivation to learn to use the more efficient generator version. >> >> Usage examples: >> >> StreamA a1 >> = a.explodeToArray(i -> new Integer[] { i }); >> StreamA a2 >> = a.explodeToCollection(i -> Collections.singleton(i)); >> StreamA a3 >> = a.explodeToConsumer((i, sink) -> sink.accept(i)); >> >> >> Alternative B: overload on SAMs. This involves three SAMs: >> >> interface Exploder { >> void explode(T element, Consumer consumer); >> } >> >> interface MapperToCollection >> extends Function> { } >> >> interface MapperToArray extends Function { } >> >> And three overloaded explode() methods: >> >> public StreamB explode(MapperToCollection exploder); >> >> public StreamB explode(MapperToArray exploder); >> >> public StreamB explode(Exploder exploder); >> >> Usage examples: >> >> StreamB b1 = b.explode(i -> new Integer[] { i }); >> StreamB b2 = b.explode(i -> Collections.singleton(i)); >> StreamB b3 = b.explode((i, sink) -> sink.accept(i)); >> >> >> I think the second approach is pretty decent. Users can easily >> understand the first two versions and use them while wrapping their head >> around the third. >> >> From tim at peierls.net Thu Feb 7 10:12:39 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 7 Feb 2013 13:12:39 -0500 Subject: explode In-Reply-To: <5112E787.5090809@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> Message-ID: On Wed, Feb 6, 2013 at 6:30 PM, Brian Goetz wrote: > Stream flatMap(FlatMapper) > Stream flatMap(Function>) > > That's the new proposal. > > Will be carved in stone in 24h unless there is further discussion :) > Still have six hours. :-) I hate to give up verbose/friendly flatMapToCollection entirely. It's not immediately obvious to me how to write it myself, and it feels as though it'll come up a lot. Even just an example in javadocs would help. --tim From brian.goetz at oracle.com Thu Feb 7 10:16:15 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 07 Feb 2013 13:16:15 -0500 Subject: explode In-Reply-To: References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> Message-ID: <5113EF6F.5060500@oracle.com> I think the proposed solution there is: - example in Javadoc, and/or static helper static FlatMapper flatMapperToCollection(Mapper> m) { (t, sink) -> m.apply(t).forEach(sink); } So users can say stream.flatMap(flatMapperToCollection(t -> getColl(t))) and the javadoc can point them to that. On 2/7/2013 1:12 PM, Tim Peierls wrote: > On Wed, Feb 6, 2013 at 6:30 PM, Brian Goetz > wrote: > > Stream flatMap(FlatMapper) > Stream flatMap(Function>) > > That's the new proposal. > > Will be carved in stone in 24h unless there is further discussion :) > > > Still have six hours. :-) > > I hate to give up verbose/friendly flatMapToCollection entirely. It's > not immediately obvious to me how to write it myself, and it feels as > though it'll come up a lot. Even just an example in javadocs would help. > > --tim > From daniel.smith at oracle.com Thu Feb 7 10:22:11 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 7 Feb 2013 11:22:11 -0700 Subject: One import static to rule them all In-Reply-To: <5112F6D1.70803@univ-mlv.fr> References: <5112F00D.4010506@univ-mlv.fr> <5112F6D1.70803@univ-mlv.fr> Message-ID: <5505E6A5-AB90-41D0-A9E0-C8B7BEDDEC9A@oracle.com> On Feb 6, 2013, at 5:35 PM, Remi Forax wrote: > On 02/07/2013 01:28 AM, Kevin Bourrillion wrote: >> I have been promised that this won't work -- that to invoke a static method on an interface one /must/ refer to the exact interface it was defined on, not a subtype, not an instance. Can someone please confirm this is true? > > We talk about a call like Interface.staticM(), but we never talk about the static import explicitly. > So if someone does a static import on an interface, you suggest that the compiler should see only the static methods declared in the interface and all the static fields declared or inherited from inherited interfaces (for backward compat.) ? The invocation restriction is imposed by defining inheritance such that the subinterface does not inherit its superinterface's members. So the parent's static methods are not members of the child. Static import, in turn, is defined in terms of the members of the child. (In fact, the inability to invoke via a child isn't really the goal -- it's more like a side effect. The main goal, I think, is to avoid lots of pain points that arise we we start to deal with multiple inheritance of static methods.) ?Dan From tim at peierls.net Thu Feb 7 10:34:08 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 7 Feb 2013 13:34:08 -0500 Subject: explode In-Reply-To: <5113EF6F.5060500@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> <5113EF6F.5060500@oracle.com> Message-ID: On Thu, Feb 7, 2013 at 1:16 PM, Brian Goetz wrote: > I think the proposed solution there is: > - example in Javadoc, and/or static helper... > OK, then. I was still having trouble following the semantics while names were being discussed, so I'm very late to the naming party. "flatMap" doesn't convey much to me, but I guess I could learn to use it. Here's my beef: A roughly analogous name in Guava is the unlovely but crystal clear "transformAndConcat". That's not accurate enough here, since the action is more general than concatenation, but something like "mapAndCollect" conveys the process in the right order -- first map, then collect the results -- which is something that "flatMap" gets backwards: "a flattening of mapped elements". --tim From tim at peierls.net Thu Feb 7 10:41:10 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 7 Feb 2013 13:41:10 -0500 Subject: Collectors update redux In-Reply-To: <5112D53F.2080205@oracle.com> References: <5112D53F.2080205@oracle.com> Message-ID: Is three-arg collect really the target "on ramp"? I would have thought the first stop would be the combinators. OTOH ... there's a lot of stuff in there. I can think of uses for all of it, but I worry about someone faced with picking the right static factory method of Collectors. Maybe with the right class comment, users can be guided to the right combinator without having to know much. --tim On Wed, Feb 6, 2013 at 5:12 PM, Brian Goetz wrote: > Did more tweaking with Collectors. > > Recall there are two basic forms of the collect method: > > The most basic one is the "on ramp", which doesn't require any > understanding of Collector or the combinators therein; it is basically the > mutable version of reduce. It looks like: > > collect(() -> R, (R,T) -> void, (R,R) -> void) > > The API shape is defined so that most invocations will work with method > references: > > // To ArrayList > collect(ArrayList::new, ArrayList::add, ArrayList::addAll) > > Note that this works in parallel too; we create list at the leaves with > ::add, and merge them up the tree with ::addAll. > > // String concat > collect(StringBuilder::new, StringBuilder::append, > StringBuilder::append) > > // Turn an int stream to a BitSet with those bits set > collect(BitSet::new, BitSet::set, BitSet::or) > > // String join with delimiter > collect(() -> new StringJoiner(", "), StringJoiner::append, > StringJoiner::append) > > Again, all these work in parallel. > > Digression: the various forms of reduce/etc form a ladder in terms of > complexity: > > If you understand reduction, you can understand... > ...reduce(T, BinaryOperator) > > If you understand the above + Optional, you can then understand... > ...reduce(BinaryOperator) > > If you understand the above + "fold" (nonhomogeneous reduction), you can > then understand... > ...reduce(U, BiFunction accumulator, BinaryOperator); > > If you understand the above + "mutable fold" (inject), you can then > understand... > ...collect(Supplier, (R,T) -> void, (R,R) -> void) > > If you understand the above + "Collector", you can then understand... > ...collect(Collector) > > This is all supported by the principle of commensurate effort; learn a > little more, can do a little more. > > OK, exiting digression, moving to the end of the list, those that use > "canned" Collectors. > > collect(Collector) > collectUnordered(Collector) > > Collectors are basically a tuple of three lambdas and a boolean indicating > whether the Collector can handle concurrent insertion: > > Collector = { () -> R, (R,T) -> void, (R,R) -> R, isConcurrent } > > Note there is a slight difference in the last argument, a > BinaryOperator rather than a BiConsumer. The BinaryOperator form > is more flexible (it can support appending two Lists into a tree > representation without copying the elements, whereas the (R,R) -> void form > can't.) This asymmetry is a rough edge, though in each case, the shape is > "locally" optimal (in the three-arg version, the void form supports method > refs better; in the Collector version, the result is more flexible, and > that's where we need the flexibility.) But we could make them consistent > at the cost of the above uses becoming more like: > > collect(StringBuilder::new, StringBuilder::append, > (l, r) -> { l.append(r); return l; }) > > Overall I think the current API yields better client code at the cost of > this slightly rough edge. > > > The set of Collectors now includes: > toCollection(Supplier<**Collection>) > toList() > toSet() > toStringBuilder() > toStringJoiner(delimiter) > > // mapping combinators (plus primitive specializations) > mapping(T->U, Collector) > > // Single-level groupBy > groupingBy(T->K) > > // groupBy with downstream Collector) > groupingBy(T->K, Collector) > > // grouping plus reduce > groupingReduce(T->K, BinaryOperator) // reduce only > groupingReduce(T->K, T->U, BinaryOperator) // map+reduce > > // join (nee mappedTo) > joiningWith(T -> U) // produces Map > > // partition > partitioningBy(Predicate) > partitioningBy(Predicate, Collector) > partitioningReduce(Predicate<**T>, BinaryOperator) > partitioningReduce(Predicate<**T>, T->U, BinaryOperator) > > // statistics (gathers sum, count, min, max, average) > toLongStatistics() > toDoubleStatistics() > > Plus, concurrent versions of most of these (which are suitable for > unordered/contended/forEach-**style execution.) Plus versions that let > you offer explicit constructors for maps and collections. While these may > seem like a lot, the implementations are highly compact -- all of these > together, plus supporting machinery, fit in 500 LoC. > > These Collectors are designed around composibility. (It is vaguely > frustrating that we even have to separate the "with downstream Collector" > versions from the reducing versions.) So they each have a form where you > can do some level of categorization and then use a downstream collector to > do further computation. This is very powerful. > > Examples, again using the familiar problem domain of transactions: > > class Txn { > Buyer buyer(); > Seller seller(); > String description(); > int amount(); > } > > Transactions by buyer: > > Map> > m = txns.collect(groupingBy(Txn::**buyer)); > > Highest-dollar transaction by buyer: > Map > m = txns.collect( > groupingReduce(Txn::buyer, > Comparators.greaterOf( > Comparators.comparing(Txn::**amount))); > > Here, comparing() takes the Txn -> amount function, and produces a > Comparator; greaterOf(comparator) turns that Comparator into a > BinaryOperator that corresponds to "max by comparator". We then reduce on > that, yielding highest-dollar transaction per buyer. > > Alternately, if you want the number, not the transaction: > Map > m = txns.collect(groupingReduce(**Txn::buyer, > Txn::amount, Integer::max)); > > Transactions by buyer, seller: > Map>> > m = txns.collect(groupingBy(Txn::**buyer, groupingBy(Txn::seller))); > > Transaction volume statistics by buyer, seller: > > Map> > m = txns.collect(groupingBy(Txn::**buyer, > groupingBy(Txn::seller, > mapping(Txn::amount, > toLongStatistics()))); > > The statistics let you get at min, max, sum, count, and average from a > single pass on the data (this trick taken from ParallelArray.) > > We can mix and match at various levels. For example: > > Transactions by buyer, partitioned int "large/small" groups: > > Predicate isLarge = t -> t.amount() > BIG; > Map>> > m = txns.collect(groupingBy(Txn::**buyer, partitioningBy(isLarge))); > > Or, turning it around: > > Map>> > m = txns.collect(partitioningBy(**isLarge, groupingBy(Txn::buyer))); > > Because Collector is public, Kevin can write and publish > Guava-multimap-bearing versions of these -- probably in about ten minutes. > > From brian.goetz at oracle.com Thu Feb 7 10:54:36 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 07 Feb 2013 13:54:36 -0500 Subject: explode In-Reply-To: References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> <5113EF6F.5060500@oracle.com> Message-ID: <5113F86C.7070900@oracle.com> flatMap is indeed map+flatten, but unfortunately we cannot factor it into two steps because of erasure. (We can't make a method on Stream> called flatten() that produces a Stream.) The name flatMap is used in Scala, and while I'm not suggesting that this constitutes any sort of proof of suitability, at least has some track record. DIGRESSION More generally, flatMap is commonly used to name the bind operator of a monad, where you have a type Foo, and flatMap has the signature: Foo flatMap(T -> Foo) IF we are to use the name flatMap, I feel it is important to at least have one overload that follows this naming pattern. On 2/7/2013 1:34 PM, Tim Peierls wrote: > On Thu, Feb 7, 2013 at 1:16 PM, Brian Goetz > wrote: > > I think the proposed solution there is: > - example in Javadoc, and/or static helper... > > > OK, then. > > I was still having trouble following the semantics while names were > being discussed, so I'm very late to the naming party. "flatMap" doesn't > convey much to me, but I guess I could learn to use it. Here's my beef: > A roughly analogous name in Guava is the unlovely but crystal clear > "transformAndConcat". That's not accurate enough here, since the action > is more general than concatenation, but something like "mapAndCollect" > conveys the process in the right order -- first map, then collect the > results -- which is something that "flatMap" gets backwards: "a > flattening of mapped elements". > > --tim From kevinb at google.com Thu Feb 7 10:56:00 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 7 Feb 2013 10:56:00 -0800 Subject: Collectors update redux In-Reply-To: References: <5112D53F.2080205@oracle.com> Message-ID: On Thu, Feb 7, 2013 at 10:41 AM, Tim Peierls wrote: Is three-arg collect really the target "on ramp"? IF you've been successfully spoon-fed the excellent examples (bitset etc.) then you can see it as reasonably simple. Otherwise you're pretty lost in the woods. > I would have thought the first stop would be the combinators. OTOH ... > there's a lot of stuff in there. I think there is *way* too much stuff in there, and I don't have enough time to even review it all before it gets set in stone. I strongly believe we would be smarter to keep the set of prepackaged Collectors much smaller and let third-party libraries experiment with which Collectors to provide.* * And, no, it's not that I *want* more code that Guava will have to build and maintain. It just seems far safer and more appropriate. JDK only needs the big ones -- a few versions of groupingBy, a few others, done. It's harder to leave out Stream methods, but these are just static things anyone could provide. > I can think of uses for all of it, but I worry about someone faced with > picking the right static factory method of Collectors. Maybe with the right > class comment, users can be guided to the right combinator without having > to know much. > > --tim > > > > On Wed, Feb 6, 2013 at 5:12 PM, Brian Goetz wrote: > >> Did more tweaking with Collectors. >> >> Recall there are two basic forms of the collect method: >> >> The most basic one is the "on ramp", which doesn't require any >> understanding of Collector or the combinators therein; it is basically the >> mutable version of reduce. It looks like: >> >> collect(() -> R, (R,T) -> void, (R,R) -> void) >> >> The API shape is defined so that most invocations will work with method >> references: >> >> // To ArrayList >> collect(ArrayList::new, ArrayList::add, ArrayList::addAll) >> >> Note that this works in parallel too; we create list at the leaves with >> ::add, and merge them up the tree with ::addAll. >> >> // String concat >> collect(StringBuilder::new, StringBuilder::append, >> StringBuilder::append) >> >> // Turn an int stream to a BitSet with those bits set >> collect(BitSet::new, BitSet::set, BitSet::or) >> >> // String join with delimiter >> collect(() -> new StringJoiner(", "), StringJoiner::append, >> StringJoiner::append) >> >> Again, all these work in parallel. >> >> Digression: the various forms of reduce/etc form a ladder in terms of >> complexity: >> >> If you understand reduction, you can understand... >> ...reduce(T, BinaryOperator) >> >> If you understand the above + Optional, you can then understand... >> ...reduce(BinaryOperator) >> >> If you understand the above + "fold" (nonhomogeneous reduction), you >> can then understand... >> ...reduce(U, BiFunction accumulator, BinaryOperator); >> >> If you understand the above + "mutable fold" (inject), you can then >> understand... >> ...collect(Supplier, (R,T) -> void, (R,R) -> void) >> >> If you understand the above + "Collector", you can then understand... >> ...collect(Collector) >> >> This is all supported by the principle of commensurate effort; learn a >> little more, can do a little more. >> >> OK, exiting digression, moving to the end of the list, those that use >> "canned" Collectors. >> >> collect(Collector) >> collectUnordered(Collector) >> >> Collectors are basically a tuple of three lambdas and a boolean >> indicating whether the Collector can handle concurrent insertion: >> >> Collector = { () -> R, (R,T) -> void, (R,R) -> R, isConcurrent } >> >> Note there is a slight difference in the last argument, a >> BinaryOperator rather than a BiConsumer. The BinaryOperator form >> is more flexible (it can support appending two Lists into a tree >> representation without copying the elements, whereas the (R,R) -> void form >> can't.) This asymmetry is a rough edge, though in each case, the shape is >> "locally" optimal (in the three-arg version, the void form supports method >> refs better; in the Collector version, the result is more flexible, and >> that's where we need the flexibility.) But we could make them consistent >> at the cost of the above uses becoming more like: >> >> collect(StringBuilder::new, StringBuilder::append, >> (l, r) -> { l.append(r); return l; }) >> >> Overall I think the current API yields better client code at the cost of >> this slightly rough edge. >> >> >> The set of Collectors now includes: >> toCollection(Supplier<**Collection>) >> toList() >> toSet() >> toStringBuilder() >> toStringJoiner(delimiter) >> >> // mapping combinators (plus primitive specializations) >> mapping(T->U, Collector) >> >> // Single-level groupBy >> groupingBy(T->K) >> >> // groupBy with downstream Collector) >> groupingBy(T->K, Collector) >> >> // grouping plus reduce >> groupingReduce(T->K, BinaryOperator) // reduce only >> groupingReduce(T->K, T->U, BinaryOperator) // map+reduce >> >> // join (nee mappedTo) >> joiningWith(T -> U) // produces Map >> >> // partition >> partitioningBy(Predicate) >> partitioningBy(Predicate, Collector) >> partitioningReduce(Predicate<**T>, BinaryOperator) >> partitioningReduce(Predicate<**T>, T->U, BinaryOperator) >> >> // statistics (gathers sum, count, min, max, average) >> toLongStatistics() >> toDoubleStatistics() >> >> Plus, concurrent versions of most of these (which are suitable for >> unordered/contended/forEach-**style execution.) Plus versions that let >> you offer explicit constructors for maps and collections. While these may >> seem like a lot, the implementations are highly compact -- all of these >> together, plus supporting machinery, fit in 500 LoC. >> >> These Collectors are designed around composibility. (It is vaguely >> frustrating that we even have to separate the "with downstream Collector" >> versions from the reducing versions.) So they each have a form where you >> can do some level of categorization and then use a downstream collector to >> do further computation. This is very powerful. >> >> Examples, again using the familiar problem domain of transactions: >> >> class Txn { >> Buyer buyer(); >> Seller seller(); >> String description(); >> int amount(); >> } >> >> Transactions by buyer: >> >> Map> >> m = txns.collect(groupingBy(Txn::**buyer)); >> >> Highest-dollar transaction by buyer: >> Map >> m = txns.collect( >> groupingReduce(Txn::buyer, >> Comparators.greaterOf( >> Comparators.comparing(Txn::**amount))); >> >> Here, comparing() takes the Txn -> amount function, and produces a >> Comparator; greaterOf(comparator) turns that Comparator into a >> BinaryOperator that corresponds to "max by comparator". We then reduce on >> that, yielding highest-dollar transaction per buyer. >> >> Alternately, if you want the number, not the transaction: >> Map >> m = txns.collect(groupingReduce(**Txn::buyer, >> Txn::amount, Integer::max)); >> >> Transactions by buyer, seller: >> Map>> >> m = txns.collect(groupingBy(Txn::**buyer, groupingBy(Txn::seller))); >> >> Transaction volume statistics by buyer, seller: >> >> Map> >> m = txns.collect(groupingBy(Txn::**buyer, >> groupingBy(Txn::seller, >> mapping(Txn::amount, >> toLongStatistics()))); >> >> The statistics let you get at min, max, sum, count, and average from a >> single pass on the data (this trick taken from ParallelArray.) >> >> We can mix and match at various levels. For example: >> >> Transactions by buyer, partitioned int "large/small" groups: >> >> Predicate isLarge = t -> t.amount() > BIG; >> Map>> >> m = txns.collect(groupingBy(Txn::**buyer, partitioningBy(isLarge))); >> >> Or, turning it around: >> >> Map>> >> m = txns.collect(partitioningBy(**isLarge, groupingBy(Txn::buyer))); >> >> Because Collector is public, Kevin can write and publish >> Guava-multimap-bearing versions of these -- probably in about ten minutes. >> >> > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From tim at peierls.net Thu Feb 7 10:59:22 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 7 Feb 2013 13:59:22 -0500 Subject: explode In-Reply-To: <5113F86C.7070900@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> <5113EF6F.5060500@oracle.com> <5113F86C.7070900@oracle.com> Message-ID: On Thu, Feb 7, 2013 at 1:54 PM, Brian Goetz wrote: > flatMap is indeed map+flatten, but unfortunately we cannot factor it into > two steps because of erasure. (We can't make a method on > Stream> called flatten() that produces a Stream.) > I wasn't suggesting that the steps could be factored, only that a name that suggests the intuitive order of the steps stands a better chance of being understood and used by newbies. --tim From brian.goetz at oracle.com Thu Feb 7 11:12:51 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 07 Feb 2013 14:12:51 -0500 Subject: Collectors update redux In-Reply-To: References: <5112D53F.2080205@oracle.com> Message-ID: <5113FCB3.80005@oracle.com> > Is three-arg collect really the target "on ramp"? Sorry, I was probably not clear. It is the onramp to the mutable part of the reduce functionality, but it builds on the more functional flavors, as outlined in the "digression" section. > IF you've been successfully spoon-fed the excellent examples (bitset > etc.) then you can see it as reasonably simple. Otherwise you're pretty > lost in the woods. I think that's fair. Which points, as we've already agreed, to the fact that this is mostly a pedagogical problem. > I would have thought the first stop would be the combinators. OTOH > ... there's a lot of stuff in there. > > I think there is *way* too much stuff in there, and I don't have enough > time to even review it all before it gets set in stone. I strongly > believe we would be smarter to keep the set of prepackaged Collectors > much smaller and let third-party libraries experiment with which > Collectors to provide. Conceptually, the set is pretty simple: base collectors == toCollection, toStatistics, toStringBuilder, joinedWith (takes Stream plus T->U, produces Map) combinator for map+collector combinator for groupBy+collector combinator for groupBy+reduce combinator for partition+collector combinator for partition+reduce plus defaults for above where if you don't have a downstream collector, it assumes "toCollection" (e.g., the no-arg groupBy). Individually, each of these is dead-simple both in concept and implementation (once you understand Collector) -- even the most complex are only 20 LoC, and many are are 1-2 LoC. I think what creates the perception of complexity is the number of forms that jumps out at you on the Javadoc page? The one place where we might consider reducing scope is by eliminating the forms that take an explicit Supplier. In other words, you always get a HashMap / ConcurrentHashMap. This cuts the number of groupBy/join forms in half. But it leaves those who want, say, to group to a TreeMap out in the cold. Do we feel that would be an improvement? Alternately, we can refactor the Map-driven collectors so that instead of the Supplier being an argument, it can be a method on the Collector: collect(groupingBy(Txn::buyer).usingMap(TreeMap::new)) by having a ToMapCollector (extends Collector) with a usingMap() method. This again gets us a nearly 2x reduction in number of methods in Collectors, at the cost of moving the "pick your own map" functionality to somewhere else. From joe.bowbeer at gmail.com Thu Feb 7 11:13:45 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 7 Feb 2013 11:13:45 -0800 Subject: explode In-Reply-To: References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> <5113EF6F.5060500@oracle.com> <5113F86C.7070900@oracle.com> Message-ID: I think flatmap is odd and I like Tim's suggestion. However if there is a choice between two odd names then I prefer flatmap because I've encountered it before in various functional contexts. On Feb 7, 2013 10:59 AM, "Tim Peierls" wrote: > On Thu, Feb 7, 2013 at 1:54 PM, Brian Goetz wrote: > >> flatMap is indeed map+flatten, but unfortunately we cannot factor it into >> two steps because of erasure. (We can't make a method on >> Stream> called flatten() that produces a Stream.) >> > > I wasn't suggesting that the steps could be factored, only that a name > that suggests the intuitive order of the steps stands a better chance of > being understood and used by newbies. > > --tim > From brian.goetz at oracle.com Thu Feb 7 11:34:10 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 07 Feb 2013 14:34:10 -0500 Subject: Collectors update redux In-Reply-To: References: <5112D53F.2080205@oracle.com> Message-ID: <511401B2.6040003@oracle.com> > I think there is *way* too much stuff in there, and I don't have enough > time to even review it all before it gets set in stone. "Too much stuff here" is kind of vague. Is the concern that some of the operations (e.g., partition) are just too niche to carry their weight? Or not fully baked as concepts? Or are some so obvious that we just expect people to write it themselves if they need it? Is the concern that there are too many forms of each operation, and that the user will be bewildered by the variety? Is it the complex interaction of {concurrent, ordered}? Can you point to a few examples of methods you would eliminate? Maybe we can induct to a pattern from there. From brian.goetz at oracle.com Thu Feb 7 11:53:30 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 07 Feb 2013 14:53:30 -0500 Subject: Fwd: hg: lambda/lambda/jdk: Replace explode() with two forms of flatMap: flatMap(T->Stream), and flatMap(FlatMapper) In-Reply-To: <20130207193737.448A0478E8@hg.openjdk.java.net> References: <20130207193737.448A0478E8@hg.openjdk.java.net> Message-ID: <5114063A.3000203@oracle.com> I pushed an update along the lines of what was discussed yesterday, so people can take a look. Summary: - Eliminated "Downstream" abstraction - Added FlatMapper type (with nested specializations) in j.u.s. - Added five forms of Stream.flatMap flatMap(Function) flatMap(FlatMapper) flatMap(FlatMapper.To{Int,Long,Double}) - Added one form of flatMap for each primitive stream: {Int,Long,Double}Stream.flatMap(FlatMapper.{ILD}To{ILD}) Check it out and see what you think. Commit message attached. I think this is an improvement. Bikeshedding on naming can continue :) -------- Original Message -------- Subject: hg: lambda/lambda/jdk: Replace explode() with two forms of flatMap: flatMap(T->Stream), and flatMap(FlatMapper) Date: Thu, 07 Feb 2013 19:37:14 +0000 From: brian.goetz at oracle.com To: lambda-dev at openjdk.java.net Changeset: 3aed6b4f4d42 Author: briangoetz Date: 2013-02-07 14:36 -0500 URL: http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3aed6b4f4d42 Replace explode() with two forms of flatMap: flatMap(T->Stream), and flatMap(FlatMapper) ! src/share/classes/java/util/stream/DoublePipeline.java ! src/share/classes/java/util/stream/DoubleStream.java + src/share/classes/java/util/stream/FlatMapper.java ! src/share/classes/java/util/stream/IntPipeline.java ! src/share/classes/java/util/stream/IntStream.java ! src/share/classes/java/util/stream/LongPipeline.java ! src/share/classes/java/util/stream/LongStream.java ! src/share/classes/java/util/stream/ReferencePipeline.java ! src/share/classes/java/util/stream/Stream.java ! test-ng/bootlib/java/util/stream/LambdaTestHelpers.java ! test-ng/boottests/java/util/stream/SpinedBufferTest.java ! test-ng/tests/org/openjdk/tests/java/util/stream/ExplodeOpTest.java ! test-ng/tests/org/openjdk/tests/java/util/stream/ToArrayOpTest.java ! test/java/util/LambdaUtilities.java ! test/java/util/stream/Stream/EmployeeStreamTest.java ! test/java/util/stream/Stream/IntStreamTest.java ! test/java/util/stream/Stream/IntegerStreamTest.java ! test/java/util/stream/Stream/StringBuilderStreamTest.java ! test/java/util/stream/Streams/BasicTest.java From kevinb at google.com Thu Feb 7 12:25:52 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 7 Feb 2013 12:25:52 -0800 Subject: Collectors update redux In-Reply-To: <511401B2.6040003@oracle.com> References: <5112D53F.2080205@oracle.com> <511401B2.6040003@oracle.com> Message-ID: On Thu, Feb 7, 2013 at 11:34 AM, Brian Goetz wrote: I think there is *way* too much stuff in there, and I don't have enough >> time to even review it all before it gets set in stone. >> > > "Too much stuff here" is kind of vague. > > Is the concern that some of the operations (e.g., partition) are just too > niche to carry their weight? Or not fully baked as concepts? > > Or are some so obvious that we just expect people to write it themselves > if they need it? > > Is the concern that there are too many forms of each operation, and that > the user will be bewildered by the variety? > > Is it the complex interaction of {concurrent, ordered}? > > Can you point to a few examples of methods you would eliminate? Maybe we > can induct to a pattern from there. > So... This illustrates the problem I'm talking about. You're implying "we need a specific argument to justify leaving X out" and the further implication is that if you feel you can refute that argument, it stays in. That's the opposite of how it works in my project ... and we actually get to remove our mistakes later! Did I miss all the discussions where each of the 40 (!) static Collectors provided was carefully considered on its merits? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu Feb 7 14:24:24 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 7 Feb 2013 14:24:24 -0800 Subject: Collectors update redux In-Reply-To: References: <5112D53F.2080205@oracle.com> <511401B2.6040003@oracle.com> Message-ID: Okay, I got a presentation over with that was stressing me out and returned to this. :-) I think I've spoken too broadly and been unfair to a degree. I'll start a new thread soon with a more constructive approach. On Thu, Feb 7, 2013 at 12:25 PM, Kevin Bourrillion wrote: > On Thu, Feb 7, 2013 at 11:34 AM, Brian Goetz wrote: > > I think there is *way* too much stuff in there, and I don't have enough >>> time to even review it all before it gets set in stone. >>> >> >> "Too much stuff here" is kind of vague. >> >> Is the concern that some of the operations (e.g., partition) are just too >> niche to carry their weight? Or not fully baked as concepts? >> >> Or are some so obvious that we just expect people to write it themselves >> if they need it? >> >> Is the concern that there are too many forms of each operation, and that >> the user will be bewildered by the variety? >> >> Is it the complex interaction of {concurrent, ordered}? >> >> Can you point to a few examples of methods you would eliminate? Maybe we >> can induct to a pattern from there. >> > > So... This illustrates the problem I'm talking about. > > You're implying "we need a specific argument to justify leaving X out" and > the further implication is that if you feel you can refute that argument, > it stays in. That's the opposite of how it works in my project ... and we > actually get to remove our mistakes later! > > Did I miss all the discussions where each of the 40 (!) static Collectors > provided was carefully considered on its merits? > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Feb 7 17:11:44 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 07 Feb 2013 20:11:44 -0500 Subject: Collectors update redux In-Reply-To: References: <5112D53F.2080205@oracle.com> Message-ID: <511450D0.1040607@oracle.com> > I can think of uses for all of it, but I worry about someone faced > with picking the right static factory method of Collectors. Maybe > with the right class comment, users can be guided to the right > combinator without having to know much. It's worth noting that the only method that is really needed is: R reduce(Supplier factory, BiFunction reducer, BinaryOperator combiner); All the other forms of reduce/collect can be written in terms of this one -- though some are more awkward than others. Similarly, all the Collectors are just "macros" for specific combinations of inputs to this form of reduce. And, as to the Collectors, groupBy can be written in terms of groupingReduce; partitioning is just grouping with a boolean-valued function; joiningWith is a form of groupingReduce too. We don't *need* any of them. They're all just reductions that can be expressed with the above form. So we *could* boil everything down to just one method. But, of course, we should not, because the client code gets harder to write, harder to read, and more error-prone. Each "A can be written in terms of B" requires an "aha" that is obvious in hindsight but could well be slow in coming. So it's really a question of "where do we turn the knob to." The forms of reduce we've got are a (non-orthogonal) set that are (subjectively) tailored to specific categories of perceived-to-be common situations. Similarly, the set of Collectors is based on having scoured various "100 cool examples with " to distill out common use cases. None of the Collectors add any "power" in the sense they can all be written as raw reduce; but they do add expressiveness. Each one you take away makes some clearly imaginable use case harder. And each one you add moves us closer to combinator overload. For example, suppose we take away mapping(T->U, Collector). The user wants to compute "average sale by salesman". He sees groupBy(Txn::seller), but that gives him a Collection, not what he wants. He sees groupBy(Txn::seller, Collector), and he sees toStatistics which will give him the average/min/max he wants, but he can't bridge the two. So he has to either do it in two passes, or write his own averaging reducer. Which isn't terribly hard but he'd rather re-use the one in the library. Adding in mapping(T->U, Collector) lets him write .collect(groupBy(Txn::seller, mapping(Txn::amount, toLongStatistics))) .getMean() and be done -- and still readable -- and obviously correct. For every single one of these, we could make the argument "we don't need it because it's ten lines of code the user could write if he needs" (all the Collectors are tiny); then again for every single one of them, we could make the argument that it's self-contained and useful for realistic use cases. So in the end the "right" set will be highly subjective. Personally, I think we've got just about the right set of operations, but maybe too many flavors of each. (Note we already took away the flatMap-like flavors of groupBy, where each input element can be mapped to multiple output elements, which already cut the number of combinations in half.) And maybe we could cut back on the variations (e.g., eliminate the forms that let you provide your own Map constructor, and you always just get a HashMap.) Or maybe we have the right forms and flavors, but we need a more Builder-like API to regularize it. Or maybe slicing them differently will be less confusing. Or more confusing. So, constructive input welcome! From brian.goetz at oracle.com Thu Feb 7 17:38:56 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 07 Feb 2013 20:38:56 -0500 Subject: Collectors update, bikeshed edition In-Reply-To: <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com> References: <51084E8C.2060403@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com> <5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com> <51091B49.6040904@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <51145730.8090409@oracle.com> Don has been filling my mailbox daily brainstorming alternate names for "collect". If we were to rename collect, the ones that seems most tolerable (largely on the basis of prior art) are "inject" and "fold". Inject is the name used by Smalltalk, Ruby, and Groovy for this. Again, not that I'm claiming that this is any sort of proof of superiority, but some people will be familiar with the name, and that's worth something. When I first came across "inject" I didn't like it. Its primary value seemed to that it rhymed with {sel,rej,det,inf,negl}ect. But, like "fold" (in the baking sense), there's a physical analogy of injecting ingredients one at a time into a larger entity or aggregation that absorbs them. It rubs most people the wrong way at first, but you do get used to it, and eventually it makes sense. Anyway, not to get Don's hopes up, but inject does have one big benefit over collect, and that is the challenges faced by Collector. One of the bad things about the name Collector is that it *doesn't* actually collect things! Instead, it is a template/recipe/scheme for *how* to collect things. But we can't use the word Collection because that clearly means something else, and CollectorTemplate/CollectorStrategy/CollectorScheme all seem too roundabout. But Injector works better as a name for what we now call Collector; you can convince yourself that a groupBy() injects data into a Map. Or, if you don't like that, the space of InjectionXxx is open (unlike with collect), such as InjectionScheme. I could tolerate switching to inject and some flavor of Injector/InjectionScheme. I could also tolerate fold(), but that is more likely to engender "that's not a fold", and Folder has the same problem as Collection. .NET calls this Aggregate, by the way. And Aggregator is clear too. Though Doug wants us to keep Aggregate free for some future Collection type, and given how rabid I've been about things like syntactic real-estate management, I think I must reluctantly agree. On 1/30/2013 3:42 PM, Raab, Donald wrote: > In my opinion, collect should return a collection. It should not reduce to any result. In the interest of time, here's a stab at an alternative list I came up with using the powers of thesaurus yesterday: > > into > gather > assemble > summarize > > The functionality currently called collect feels more like injectInto/inject in Smalltalk/Ruby/Groovy, but nothing is being injected into the method collect directly, but by the Collector (the R in makeResult()). InjectInto/inject is the equivalent of foldLeft. I would be less concerned over using injectInto or inject than collect, as at least it seems similar enough in that it can return any value, determined by the injector (currently called Collector). But folks here might consider injectInto and foldLeft too cryptic, so I decided to just shorten to into in the above list. > > http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html#inject(groovy.lang.Closure) > > In the binary release I have (not sure if this is different in current source), two of the overloaded versions of the method collect create a FoldOp today (a hint), and the Collector interface has a method called accumulate and combine and is called MutableReducer in the javadoc. The methods named reduce also create FoldOp instances. This makes reduce and collect seem eerily similar. > > I find this a little confusing, but I have tried my best anyway to name that which by any other name seems to be more like injectInto/mapReduce/foldL/aggregate/etc. to me. > > Thoughts? > >>> I will do >>> my best and find an alternative that everyone else here likes. >> >> Thanks. >> >> -Doug > From david.holmes at oracle.com Thu Feb 7 18:03:47 2013 From: david.holmes at oracle.com (David Holmes) Date: Fri, 08 Feb 2013 12:03:47 +1000 Subject: Collectors update, bikeshed edition In-Reply-To: <51145730.8090409@oracle.com> References: <51084E8C.2060403@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com> <5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com> <51091B49.6040904@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com> <51145730.8090409@oracle.com> Message-ID: <51145D03.5@oracle.com> I'm coming at this from a position of complete ignorance. I've never learnt functional programming though I had been exposed to some functional style of operations. The names collect/inject/fold are all equally meaningless to me. While I think I know what groupBy does I don't recognize it as a concrete instance of some abstract concept (folding, injection, aggregating etc). So my question is, for people who will learn Java through the primary/traditional channels (is schools, college, university etc), where would they learn the underlying concepts that these API's pertain to? And what terminology are they most likely to encounter there? FWIW I would much rather have a name with no obvious meaning than a name that I'm likely to think means something quite different to what it is. (unfortunately that is likely to apply to any verb we might use here.) David On 8/02/2013 11:38 AM, Brian Goetz wrote: > Don has been filling my mailbox daily brainstorming alternate names for > "collect". If we were to rename collect, the ones that seems most > tolerable (largely on the basis of prior art) are "inject" and "fold". > Inject is the name used by Smalltalk, Ruby, and Groovy for this. Again, > not that I'm claiming that this is any sort of proof of superiority, but > some people will be familiar with the name, and that's worth something. > > When I first came across "inject" I didn't like it. Its primary value > seemed to that it rhymed with {sel,rej,det,inf,negl}ect. But, like > "fold" (in the baking sense), there's a physical analogy of injecting > ingredients one at a time into a larger entity or aggregation that > absorbs them. It rubs most people the wrong way at first, but you do > get used to it, and eventually it makes sense. > > Anyway, not to get Don's hopes up, but inject does have one big benefit > over collect, and that is the challenges faced by Collector. One of the > bad things about the name Collector is that it *doesn't* actually > collect things! Instead, it is a template/recipe/scheme for *how* to > collect things. But we can't use the word Collection because that > clearly means something else, and > CollectorTemplate/CollectorStrategy/CollectorScheme all seem too > roundabout. > > But Injector works better as a name for what we now call Collector; you > can convince yourself that a groupBy() injects data into a Map. Or, if > you don't like that, the space of InjectionXxx is open (unlike with > collect), such as InjectionScheme. > > I could tolerate switching to inject and some flavor of > Injector/InjectionScheme. I could also tolerate fold(), but that is > more likely to engender "that's not a fold", and Folder has the same > problem as Collection. > > .NET calls this Aggregate, by the way. And Aggregator is clear too. > Though Doug wants us to keep Aggregate free for some future Collection > type, and given how rabid I've been about things like syntactic > real-estate management, I think I must reluctantly agree. > > > On 1/30/2013 3:42 PM, Raab, Donald wrote: >> In my opinion, collect should return a collection. It should not >> reduce to any result. In the interest of time, here's a stab at an >> alternative list I came up with using the powers of thesaurus yesterday: >> >> into >> gather >> assemble >> summarize >> >> The functionality currently called collect feels more like >> injectInto/inject in Smalltalk/Ruby/Groovy, but nothing is being >> injected into the method collect directly, but by the Collector (the R >> in makeResult()). InjectInto/inject is the equivalent of foldLeft. I >> would be less concerned over using injectInto or inject than collect, >> as at least it seems similar enough in that it can return any value, >> determined by the injector (currently called Collector). But folks >> here might consider injectInto and foldLeft too cryptic, so I decided >> to just shorten to into in the above list. >> >> http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html#inject(groovy.lang.Closure) >> >> >> In the binary release I have (not sure if this is different in current >> source), two of the overloaded versions of the method collect create a >> FoldOp today (a hint), and the Collector interface has a method called >> accumulate and combine and is called MutableReducer in the javadoc. >> The methods named reduce also create FoldOp instances. This makes >> reduce and collect seem eerily similar. >> >> I find this a little confusing, but I have tried my best anyway to >> name that which by any other name seems to be more like >> injectInto/mapReduce/foldL/aggregate/etc. to me. >> >> Thoughts? >> >>>> I will do >>>> my best and find an alternative that everyone else here likes. >>> >>> Thanks. >>> >>> -Doug >> From joe.bowbeer at gmail.com Thu Feb 7 18:29:09 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 7 Feb 2013 18:29:09 -0800 Subject: Collectors update, bikeshed edition In-Reply-To: <51145D03.5@oracle.com> References: <51084E8C.2060403@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com> <5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com> <51091B49.6040904@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com> <51145730.8090409@oracle.com> <51145D03.5@oracle.com> Message-ID: David, I doubt there is a clear answer to your question, though I would point to Ruby, Python and Groovy, in some order, for guidance as to what Java programming students may have been exposed to. This is an interesting take on the "inject" name in Ruby: http://railspikes.com/2008/8/11/understanding-map-and-reduce inject, reduce, and fold have all been used for "reduce" in Ruby. inject is apparently from Smalltalk? I like the section "Why 3+ names". My preferences are guided by how the simple examples "read", and I like the way "collect" reads in the Anagrams example: public static Stream> anagrams(Stream words) { return words.parallel().collectUnordered(groupingBy(Anagrams::key)) .values().parallelStream().filter(v -> v.size() > 1); } Based on my own experience learning this complicated API, I think the best approach for teaching these methods will be heavily reliant on a cookbook of simple examples. --Joe On Thu, Feb 7, 2013 at 6:03 PM, David Holmes wrote: > I'm coming at this from a position of complete ignorance. I've never > learnt functional programming though I had been exposed to some functional > style of operations. The names collect/inject/fold are all equally > meaningless to me. While I think I know what groupBy does I don't recognize > it as a concrete instance of some abstract concept (folding, injection, > aggregating etc). > > So my question is, for people who will learn Java through the > primary/traditional channels (is schools, college, university etc), where > would they learn the underlying concepts that these API's pertain to? And > what terminology are they most likely to encounter there? > > FWIW I would much rather have a name with no obvious meaning than a name > that I'm likely to think means something quite different to what it is. > (unfortunately that is likely to apply to any verb we might use here.) > > David > > > On 8/02/2013 11:38 AM, Brian Goetz wrote: > >> Don has been filling my mailbox daily brainstorming alternate names for >> "collect". If we were to rename collect, the ones that seems most >> tolerable (largely on the basis of prior art) are "inject" and "fold". >> Inject is the name used by Smalltalk, Ruby, and Groovy for this. Again, >> not that I'm claiming that this is any sort of proof of superiority, but >> some people will be familiar with the name, and that's worth something. >> >> When I first came across "inject" I didn't like it. Its primary value >> seemed to that it rhymed with {sel,rej,det,inf,negl}ect. But, like >> "fold" (in the baking sense), there's a physical analogy of injecting >> ingredients one at a time into a larger entity or aggregation that >> absorbs them. It rubs most people the wrong way at first, but you do >> get used to it, and eventually it makes sense. >> >> Anyway, not to get Don's hopes up, but inject does have one big benefit >> over collect, and that is the challenges faced by Collector. One of the >> bad things about the name Collector is that it *doesn't* actually >> collect things! Instead, it is a template/recipe/scheme for *how* to >> collect things. But we can't use the word Collection because that >> clearly means something else, and >> CollectorTemplate/**CollectorStrategy/**CollectorScheme all seem too >> roundabout. >> >> But Injector works better as a name for what we now call Collector; you >> can convince yourself that a groupBy() injects data into a Map. Or, if >> you don't like that, the space of InjectionXxx is open (unlike with >> collect), such as InjectionScheme. >> >> I could tolerate switching to inject and some flavor of >> Injector/InjectionScheme. I could also tolerate fold(), but that is >> more likely to engender "that's not a fold", and Folder has the same >> problem as Collection. >> >> .NET calls this Aggregate, by the way. And Aggregator is clear too. >> Though Doug wants us to keep Aggregate free for some future Collection >> type, and given how rabid I've been about things like syntactic >> real-estate management, I think I must reluctantly agree. >> >> >> On 1/30/2013 3:42 PM, Raab, Donald wrote: >> >>> In my opinion, collect should return a collection. It should not >>> reduce to any result. In the interest of time, here's a stab at an >>> alternative list I came up with using the powers of thesaurus yesterday: >>> >>> into >>> gather >>> assemble >>> summarize >>> >>> The functionality currently called collect feels more like >>> injectInto/inject in Smalltalk/Ruby/Groovy, but nothing is being >>> injected into the method collect directly, but by the Collector (the R >>> in makeResult()). InjectInto/inject is the equivalent of foldLeft. I >>> would be less concerned over using injectInto or inject than collect, >>> as at least it seems similar enough in that it can return any value, >>> determined by the injector (currently called Collector). But folks >>> here might consider injectInto and foldLeft too cryptic, so I decided >>> to just shorten to into in the above list. >>> >>> http://groovy.codehaus.org/**groovy-jdk/java/util/** >>> Collection.html#inject(groovy.**lang.Closure) >>> >>> >>> In the binary release I have (not sure if this is different in current >>> source), two of the overloaded versions of the method collect create a >>> FoldOp today (a hint), and the Collector interface has a method called >>> accumulate and combine and is called MutableReducer in the javadoc. >>> The methods named reduce also create FoldOp instances. This makes >>> reduce and collect seem eerily similar. >>> >>> I find this a little confusing, but I have tried my best anyway to >>> name that which by any other name seems to be more like >>> injectInto/mapReduce/foldL/**aggregate/etc. to me. >>> >>> Thoughts? >>> >>> I will do >>>>> my best and find an alternative that everyone else here likes. >>>>> >>>> >>>> Thanks. >>>> >>>> -Doug >>>> >>> >>> From brian.goetz at oracle.com Thu Feb 7 19:09:32 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 07 Feb 2013 22:09:32 -0500 Subject: Collectors update, bikeshed edition In-Reply-To: <51145D03.5@oracle.com> References: <51084E8C.2060403@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com> <5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com> <51091B49.6040904@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com> <51145730.8090409@oracle.com> <51145D03.5@oracle.com> Message-ID: <51146C6C.6070001@oracle.com> > So my question is, for people who will learn Java through the > primary/traditional channels (is schools, college, university etc), > where would they learn the underlying concepts that these API's pertain > to? And what terminology are they most likely to encounter there? Thanks David, for bringing us back to the primary challenge here: pedagogical. Obviously we will do what we can in Javadoc (though haven't done so yet), but ultimately this will only scratch the surface. (Just as the Javadoc for JUC only scratched the surface for concurrency concepts. And we all know where that led.) > FWIW I would much rather have a name with no obvious meaning than a name > that I'm likely to think means something quite different to what it is. > (unfortunately that is likely to apply to any verb we might use here.) Right, so "inject" and "grobulate" are equally good by that metric :) From david.holmes at oracle.com Thu Feb 7 19:37:58 2013 From: david.holmes at oracle.com (David Holmes) Date: Fri, 08 Feb 2013 13:37:58 +1000 Subject: Collectors update, bikeshed edition In-Reply-To: <51146C6C.6070001@oracle.com> References: <51084E8C.2060403@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com> <5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com> <51091B49.6040904@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com> <51145730.8090409@oracle.com> <51145D03.5@oracle.com> <51146C6C.6070001@oracle.com> Message-ID: <51147316.1050107@oracle.com> On 8/02/2013 1:09 PM, Brian Goetz wrote: >> So my question is, for people who will learn Java through the >> primary/traditional channels (is schools, college, university etc), >> where would they learn the underlying concepts that these API's pertain >> to? And what terminology are they most likely to encounter there? > > Thanks David, for bringing us back to the primary challenge here: > pedagogical. Obviously we will do what we can in Javadoc (though > haven't done so yet), but ultimately this will only scratch the surface. > (Just as the Javadoc for JUC only scratched the surface for > concurrency concepts. And we all know where that led.) I may be biased but I think we had the easier job with j.u.c >> FWIW I would much rather have a name with no obvious meaning than a name >> that I'm likely to think means something quite different to what it is. >> (unfortunately that is likely to apply to any verb we might use here.) > > Right, so "inject" and "grobulate" are equally good by that metric :) No, I have various notions of inject/injection - and after reading the link Joe posted (thanks Joe!) and some references therefrom, the relationship between inject and actually injecting something seems so tangential to the real functionality that it is obviously a terrible name. grobulate I quite like. ;-) David From kevinb at google.com Thu Feb 7 19:44:19 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 7 Feb 2013 19:44:19 -0800 Subject: Collectors update, bikeshed edition In-Reply-To: <51145730.8090409@oracle.com> References: <51084E8C.2060403@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com> <5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com> <51091B49.6040904@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com> <51145730.8090409@oracle.com> Message-ID: I come from an environment where usage of @javax.inject.Inject is just utterly ubiquitous, so I really *can't* like inject() for this. On Thu, Feb 7, 2013 at 5:38 PM, Brian Goetz wrote: > Don has been filling my mailbox daily brainstorming alternate names for > "collect". If we were to rename collect, the ones that seems most > tolerable (largely on the basis of prior art) are "inject" and "fold". > Inject is the name used by Smalltalk, Ruby, and Groovy for this. Again, > not that I'm claiming that this is any sort of proof of superiority, but > some people will be familiar with the name, and that's worth something. > > When I first came across "inject" I didn't like it. Its primary value > seemed to that it rhymed with {sel,rej,det,inf,negl}ect. But, like "fold" > (in the baking sense), there's a physical analogy of injecting ingredients > one at a time into a larger entity or aggregation that absorbs them. It > rubs most people the wrong way at first, but you do get used to it, and > eventually it makes sense. > > Anyway, not to get Don's hopes up, but inject does have one big benefit > over collect, and that is the challenges faced by Collector. One of the > bad things about the name Collector is that it *doesn't* actually collect > things! Instead, it is a template/recipe/scheme for *how* to collect > things. But we can't use the word Collection because that clearly means > something else, and CollectorTemplate/**CollectorStrategy/**CollectorScheme > all seem too roundabout. > > But Injector works better as a name for what we now call Collector; you > can convince yourself that a groupBy() injects data into a Map. Or, if you > don't like that, the space of InjectionXxx is open (unlike with collect), > such as InjectionScheme. > > I could tolerate switching to inject and some flavor of > Injector/InjectionScheme. I could also tolerate fold(), but that is more > likely to engender "that's not a fold", and Folder has the same problem as > Collection. > > .NET calls this Aggregate, by the way. And Aggregator is clear too. > Though Doug wants us to keep Aggregate free for some future Collection > type, and given how rabid I've been about things like syntactic real-estate > management, I think I must reluctantly agree. > > > On 1/30/2013 3:42 PM, Raab, Donald wrote: > >> In my opinion, collect should return a collection. It should not reduce >> to any result. In the interest of time, here's a stab at an alternative >> list I came up with using the powers of thesaurus yesterday: >> >> into >> gather >> assemble >> summarize >> >> The functionality currently called collect feels more like >> injectInto/inject in Smalltalk/Ruby/Groovy, but nothing is being injected >> into the method collect directly, but by the Collector (the R in >> makeResult()). InjectInto/inject is the equivalent of foldLeft. I would >> be less concerned over using injectInto or inject than collect, as at least >> it seems similar enough in that it can return any value, determined by the >> injector (currently called Collector). But folks here might consider >> injectInto and foldLeft too cryptic, so I decided to just shorten to into >> in the above list. >> >> http://groovy.codehaus.org/**groovy-jdk/java/util/** >> Collection.html#inject(groovy.**lang.Closure) >> >> In the binary release I have (not sure if this is different in current >> source), two of the overloaded versions of the method collect create a >> FoldOp today (a hint), and the Collector interface has a method called >> accumulate and combine and is called MutableReducer in the javadoc. The >> methods named reduce also create FoldOp instances. This makes reduce and >> collect seem eerily similar. >> >> I find this a little confusing, but I have tried my best anyway to name >> that which by any other name seems to be more like >> injectInto/mapReduce/foldL/**aggregate/etc. to me. >> >> Thoughts? >> >> I will do >>>> my best and find an alternative that everyone else here likes. >>>> >>> >>> Thanks. >>> >>> -Doug >>> >> >> -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From Donald.Raab at gs.com Thu Feb 7 19:49:36 2013 From: Donald.Raab at gs.com (Raab, Donald) Date: Thu, 7 Feb 2013 22:49:36 -0500 Subject: Collectors update, bikeshed edition In-Reply-To: <51145D03.5@oracle.com> References: <51084E8C.2060403@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com> <5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com> <51091B49.6040904@cs.oswego.edu> <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com> <51145730.8090409@oracle.com> <51145D03.5@oracle.com> Message-ID: <6712820CB52CFB4D842561213A77C05404C3A8932D@GSCMAMP09EX.firmwide.corp.gs.com> > So my question is, for people who will learn Java through the > primary/traditional channels (is schools, college, university etc), > where would they learn the underlying concepts that these API's pertain > to? And what terminology are they most likely to encounter there? Many developers may learn them through a combination of Google search & StackOverflow (often found through Google). http://stackoverflow.com/questions/10875607/comprehensive-list-of-synonyms-for-reduce/10919742#10919742 From brian.goetz at oracle.com Fri Feb 8 07:25:05 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 10:25:05 -0500 Subject: Refactor of Collector interface Message-ID: <511518D1.5050706@oracle.com> FYI: In a recent refactoring, I changed: public interface Collector { R makeResult(); void accumulate(R result, T value); R combine(R result, R other); } to public interface Collector { Supplier resultSupplier(); BiConsumer accumulator(); BinaryOperator combiner(); } Basically, this is a refactoring from typical interface to tuple-of-lambdas. What I found was that there was a lot of adaptation going on, where something would start out as a lambda, we'd wrap it with a Collector whose method invoked the lambda, then take a method reference to that wrapping method and then later wrap that with another Collector, etc. By keeping access to the functions directly, the Collectors code got simpler and less wrappy, since a lot of functions could just be passed right through without wrapping. And a lot of stupid adapter classes went away. While clearly we don't want all interfaces to evolve this way, this is one where *all* the many layers of manipulations are effectively function composition, and exposing the function-ness made that cleaner and more performant. So while I don't feel completely super-great about it, I think its enough of a win to keep. From tim at peierls.net Fri Feb 8 07:31:08 2013 From: tim at peierls.net (Tim Peierls) Date: Fri, 8 Feb 2013 10:31:08 -0500 Subject: Refactor of Collector interface In-Reply-To: <511518D1.5050706@oracle.com> References: <511518D1.5050706@oracle.com> Message-ID: That's a good change. You don't need to defend it as a special case, though: I think it's actually clearer the new way. --tim On Fri, Feb 8, 2013 at 10:25 AM, Brian Goetz wrote: > FYI: In a recent refactoring, I changed: > > public interface Collector { > R makeResult(); > void accumulate(R result, T value); > R combine(R result, R other); > } > > to > > public interface Collector { > Supplier resultSupplier(); > BiConsumer accumulator(); > BinaryOperator combiner(); > } > > Basically, this is a refactoring from typical interface to > tuple-of-lambdas. What I found was that there was a lot of adaptation > going on, where something would start out as a lambda, we'd wrap it with a > Collector whose method invoked the lambda, then take a method reference to > that wrapping method and then later wrap that with another Collector, etc. > By keeping access to the functions directly, the Collectors code got > simpler and less wrappy, since a lot of functions could just be passed > right through without wrapping. And a lot of stupid adapter classes went > away. > > While clearly we don't want all interfaces to evolve this way, this is one > where *all* the many layers of manipulations are effectively function > composition, and exposing the function-ness made that cleaner and more > performant. So while I don't feel completely super-great about it, I think > its enough of a win to keep. > > From brian.goetz at oracle.com Fri Feb 8 07:47:17 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 10:47:17 -0500 Subject: explode In-Reply-To: References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> Message-ID: <51151E05.50908@oracle.com> OK, just to put it all down on "paper" where flatMap landed...are we OK with this? java.util.stream.FlatMapper: public interface FlatMapper { void explodeInto(T element, Consumer sink); interface ToInt { void explodeInto(T element, IntConsumer sink); } interface ToLong { void explodeInto(T element, LongConsumer sink); } interface ToDouble { void explodeInto(T element, DoubleConsumer sink); } interface OfIntToInt { void explodeInto(int element, IntConsumer sink); } interface OfLongToLong { void explodeInto(long element, LongConsumer sink); } interface OfDoubleToDouble { void explodeInto(double element, DoubleConsumer sink); } } In Stream: Stream flatMap(Function> mapper); Stream flatMap(FlatMapper mapper); IntStream flatMap(FlatMapper.ToInt mapper); LongStream flatMap(FlatMapper.ToLong mapper); DoubleStream flatMap(FlatMapper.ToDouble mapper); In IntStream (similar for {Double,Long}Stream): IntStream flatMap(IntFunction mapper); IntStream flatMap(FlatMapper.OfIntToInt mapper); And Remi wants one more static helper method in FlatMap: public static FlatMapper explodeCollection(Function> function) I think this wraps up the explosive section of our program? On 2/6/2013 7:05 PM, Kevin Bourrillion wrote: > On Wed, Feb 6, 2013 at 3:30 PM, Brian Goetz > wrote: > > Stream flatMap(FlatMapper) > > Stream flatMap(Function>) > > > To make sure I understand: would these two behave identically? Would > they imaginably perform comparably? > > foos.stream().flatMap((t, consumer) -> > t.somethingThatGivesAStream().forEach(consumer)) > foos.stream().flatMap(t -> t.somethingThatGivesAStream()) > > Second question, why "FlatMapper.OfInt" here, but "IntSupplier" etc. > elsewhere? > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From kevinb at google.com Fri Feb 8 08:35:06 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 08:35:06 -0800 Subject: Refactor of Collector interface In-Reply-To: References: <511518D1.5050706@oracle.com> Message-ID: Hmm, it's difficult for me to perceive what these benefits are from looking at the change to Collectors.java, and the file did get 70 lines longer as a result of the change fwiw, and seems to rely more on private abstract base classes that other Collector implementors won't have. (How do you get to side-by-side diff in this thing? I feel quite blind without it and am thus stuck in "I don't get it" mode.) On Fri, Feb 8, 2013 at 8:22 AM, Kevin Bourrillion wrote: > My subjective sense of good Java API design very strongly prefers the > "before" picture here, which I see as a lot more "Java-like", so I'm taking > a closer look. > > I assume that the trade-offs we're weighing here are purely to do with > what it's like to be a Collector implementor, correct? > > > On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz wrote: > >> FYI: In a recent refactoring, I changed: >> >> public interface Collector { >> R makeResult(); >> void accumulate(R result, T value); >> R combine(R result, R other); >> } >> >> to >> >> public interface Collector { >> Supplier resultSupplier(); >> BiConsumer accumulator(); >> BinaryOperator combiner(); >> } >> >> Basically, this is a refactoring from typical interface to >> tuple-of-lambdas. What I found was that there was a lot of adaptation >> going on, where something would start out as a lambda, we'd wrap it with a >> Collector whose method invoked the lambda, then take a method reference to >> that wrapping method and then later wrap that with another Collector, etc. >> By keeping access to the functions directly, the Collectors code got >> simpler and less wrappy, since a lot of functions could just be passed >> right through without wrapping. And a lot of stupid adapter classes went >> away. >> >> While clearly we don't want all interfaces to evolve this way, this is >> one where *all* the many layers of manipulations are effectively function >> composition, and exposing the function-ness made that cleaner and more >> performant. So while I don't feel completely super-great about it, I think >> its enough of a win to keep. >> >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Feb 8 08:36:05 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 11:36:05 -0500 Subject: Refactor of Collector interface In-Reply-To: References: <511518D1.5050706@oracle.com> Message-ID: <51152975.1040305@oracle.com> Your subjective sense is accurate, which is why I brought this up. This may be an example where is better to depart from the traditional approach. To your question, it depends what you mean by "purely to do with an implementor." Collector *users* are going to be burdened with the performance consequences of multiple layers of wrapping/conversion. The implementation used to be full of alternation between: interface Foo { U transform(T t); } class FooAdapter { FooAdapter(Function lambda) { ... } U transform(T t) { return lambda.apply(t); } } and Function parentTransformer = foo::transform; and back again, introducing layers of wrapping even when the function is not changing across layers. On 2/8/2013 11:22 AM, Kevin Bourrillion wrote: > My subjective sense of good Java API design very strongly prefers the > "before" picture here, which I see as a lot more "Java-like", so I'm > taking a closer look. > > I assume that the trade-offs we're weighing here are purely to do with > what it's like to be a Collector implementor, correct? > > > On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz > wrote: > > FYI: In a recent refactoring, I changed: > > public interface Collector { > R makeResult(); > void accumulate(R result, T value); > R combine(R result, R other); > } > > to > > public interface Collector { > Supplier resultSupplier(); > BiConsumer accumulator(); > BinaryOperator combiner(); > } > > Basically, this is a refactoring from typical interface to > tuple-of-lambdas. What I found was that there was a lot of > adaptation going on, where something would start out as a lambda, > we'd wrap it with a Collector whose method invoked the lambda, then > take a method reference to that wrapping method and then later wrap > that with another Collector, etc. By keeping access to the > functions directly, the Collectors code got simpler and less wrappy, > since a lot of functions could just be passed right through without > wrapping. And a lot of stupid adapter classes went away. > > While clearly we don't want all interfaces to evolve this way, this > is one where *all* the many layers of manipulations are effectively > function composition, and exposing the function-ness made that > cleaner and more performant. So while I don't feel completely > super-great about it, I think its enough of a win to keep. > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From kevinb at google.com Fri Feb 8 08:22:00 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 08:22:00 -0800 Subject: Refactor of Collector interface In-Reply-To: <511518D1.5050706@oracle.com> References: <511518D1.5050706@oracle.com> Message-ID: My subjective sense of good Java API design very strongly prefers the "before" picture here, which I see as a lot more "Java-like", so I'm taking a closer look. I assume that the trade-offs we're weighing here are purely to do with what it's like to be a Collector implementor, correct? On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz wrote: > FYI: In a recent refactoring, I changed: > > public interface Collector { > R makeResult(); > void accumulate(R result, T value); > R combine(R result, R other); > } > > to > > public interface Collector { > Supplier resultSupplier(); > BiConsumer accumulator(); > BinaryOperator combiner(); > } > > Basically, this is a refactoring from typical interface to > tuple-of-lambdas. What I found was that there was a lot of adaptation > going on, where something would start out as a lambda, we'd wrap it with a > Collector whose method invoked the lambda, then take a method reference to > that wrapping method and then later wrap that with another Collector, etc. > By keeping access to the functions directly, the Collectors code got > simpler and less wrappy, since a lot of functions could just be passed > right through without wrapping. And a lot of stupid adapter classes went > away. > > While clearly we don't want all interfaces to evolve this way, this is one > where *all* the many layers of manipulations are effectively function > composition, and exposing the function-ness made that cleaner and more > performant. So while I don't feel completely super-great about it, I think > its enough of a win to keep. > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Feb 8 08:39:46 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 11:39:46 -0500 Subject: Refactor of Collector interface In-Reply-To: References: <511518D1.5050706@oracle.com> Message-ID: <51152A52.8020200@oracle.com> > Hmm, it's difficult for me to perceive what these benefits are from > looking at the change to Collectors.java > , > and the file did get 70 lines longer as a result of the change fwiw, and > seems to rely more on private abstract base classes that other Collector > implementors won't have. It actually got smaller when this transform was applied, but more stuff went into Collectors in the same changeset, such as the mapped() combinators. I have no objection to making that abstract base class public if that's a concern, though it's not really necessary since Collector writers can do without it: class FooCollector implements Collector { Supplier resultSupplier() { return Foo:: new; } ... } The abstract class is mostly there as a "fake tuple" class for convenience of the Collectors implementation, and I think we're on record as saying that it is reasonable to expect users to write their own fake tuple classes. > (How do you get to side-by-side diff in this thing? I feel quite blind > without it and am thus stuck in "I don't get it" mode.) > > > On Fri, Feb 8, 2013 at 8:22 AM, Kevin Bourrillion > wrote: > > My subjective sense of good Java API design very strongly prefers > the "before" picture here, which I see as a lot more "Java-like", so > I'm taking a closer look. > > I assume that the trade-offs we're weighing here are purely to do > with what it's like to be a Collector implementor, correct? > > > On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz > wrote: > > FYI: In a recent refactoring, I changed: > > public interface Collector { > R makeResult(); > void accumulate(R result, T value); > R combine(R result, R other); > } > > to > > public interface Collector { > Supplier resultSupplier(); > BiConsumer accumulator(); > BinaryOperator combiner(); > } > > Basically, this is a refactoring from typical interface to > tuple-of-lambdas. What I found was that there was a lot of > adaptation going on, where something would start out as a > lambda, we'd wrap it with a Collector whose method invoked the > lambda, then take a method reference to that wrapping method and > then later wrap that with another Collector, etc. By keeping > access to the functions directly, the Collectors code got > simpler and less wrappy, since a lot of functions could just be > passed right through without wrapping. And a lot of stupid > adapter classes went away. > > While clearly we don't want all interfaces to evolve this way, > this is one where *all* the many layers of manipulations are > effectively function composition, and exposing the function-ness > made that cleaner and more performant. So while I don't feel > completely super-great about it, I think its enough of a win to > keep. > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From kevinb at google.com Fri Feb 8 08:43:35 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 08:43:35 -0800 Subject: Refactor of Collector interface In-Reply-To: <51152975.1040305@oracle.com> References: <511518D1.5050706@oracle.com> <51152975.1040305@oracle.com> Message-ID: Oh, it's about performance. I see that now. Well, if it's possible to just tell us, "Hey, a group-by of 10000 elements used to incur N bytes of garbage and now causes only M," that's very easy to know how to react to. On Fri, Feb 8, 2013 at 8:36 AM, Brian Goetz wrote: > Your subjective sense is accurate, which is why I brought this up. This > may be an example where is better to depart from the traditional approach. > > To your question, it depends what you mean by "purely to do with an > implementor." Collector *users* are going to be burdened with the > performance consequences of multiple layers of wrapping/conversion. > > The implementation used to be full of alternation between: > > interface Foo { > U transform(T t); > } > > class FooAdapter { > FooAdapter(Function lambda) { ... } > > U transform(T t) { return lambda.apply(t); } > } > > and > > Function parentTransformer = foo::transform; > > and back again, introducing layers of wrapping even when the function is > not changing across layers. > > > > > On 2/8/2013 11:22 AM, Kevin Bourrillion wrote: > >> My subjective sense of good Java API design very strongly prefers the >> "before" picture here, which I see as a lot more "Java-like", so I'm >> taking a closer look. >> >> I assume that the trade-offs we're weighing here are purely to do with >> what it's like to be a Collector implementor, correct? >> >> >> On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz > > wrote: >> >> FYI: In a recent refactoring, I changed: >> >> public interface Collector { >> R makeResult(); >> void accumulate(R result, T value); >> R combine(R result, R other); >> } >> >> to >> >> public interface Collector { >> Supplier resultSupplier(); >> BiConsumer accumulator(); >> BinaryOperator combiner(); >> } >> >> Basically, this is a refactoring from typical interface to >> tuple-of-lambdas. What I found was that there was a lot of >> adaptation going on, where something would start out as a lambda, >> we'd wrap it with a Collector whose method invoked the lambda, then >> take a method reference to that wrapping method and then later wrap >> that with another Collector, etc. By keeping access to the >> functions directly, the Collectors code got simpler and less wrappy, >> since a lot of functions could just be passed right through without >> wrapping. And a lot of stupid adapter classes went away. >> >> While clearly we don't want all interfaces to evolve this way, this >> is one where *all* the many layers of manipulations are effectively >> function composition, and exposing the function-ness made that >> cleaner and more performant. So while I don't feel completely >> super-great about it, I think its enough of a win to keep. >> >> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> >> > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From tim at peierls.net Fri Feb 8 09:13:09 2013 From: tim at peierls.net (Tim Peierls) Date: Fri, 8 Feb 2013 12:13:09 -0500 Subject: Refactor of Collector interface In-Reply-To: References: <511518D1.5050706@oracle.com> Message-ID: On Fri, Feb 8, 2013 at 11:22 AM, Kevin Bourrillion wrote: > My subjective sense of good Java API design very strongly prefers the > "before" picture here, which I see as a lot more "Java-like", so I'm taking > a closer look. The before picture is certainly more pre-lambda-Java-like, but I don't think it's fair to knock something meant to fit well with a new language feature by those rules. I thought the return types of the after picture conveyed more clearly the idea of "I'm going to need a way to supply result objects, and way to accumulate elements into result objects, and a way to combine result objects." And seeing those interface types as return types reinforced my understanding of those types. I assume that the trade-offs we're weighing here are purely to do with what > it's like to be a Collector implementor, correct? > Well, since I persist in preferring the after picture -- maybe the impending blizzard has addled my senses -- I'd say the benefit to Collector implementers is secondary. --tim From kevinb at google.com Fri Feb 8 09:30:45 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 09:30:45 -0800 Subject: Refactor of Collector interface In-Reply-To: References: <511518D1.5050706@oracle.com> Message-ID: On Fri, Feb 8, 2013 at 9:13 AM, Tim Peierls wrote: My subjective sense of good Java API design very strongly prefers the >> "before" picture here, which I see as a lot more "Java-like", so I'm taking >> a closer look. > > > The before picture is certainly more pre-lambda-Java-like, but I don't > think it's fair to knock something meant to fit well with a new language > feature by those rules. > I think I'm only really saying the same thing Brian is when he says "While clearly we don't want all interfaces to evolve this way..." and "while I don't feel completely super-great about it....", etc. I'd prefer to not rely on the taste argument if we can treat the benefits concretely. > > I thought the return types of the after picture conveyed more clearly the > idea of "I'm going to need a way to supply result objects, and way to > accumulate elements into result objects, and a way to combine result > objects." And seeing those interface types as return types reinforced my > understanding of those types. > > > I assume that the trade-offs we're weighing here are purely to do with >> what it's like to be a Collector implementor, correct? >> > > Well, since I persist in preferring the after picture -- maybe the > impending blizzard has addled my senses -- I'd say the benefit to Collector > implementers is secondary. > > --tim > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From tim at peierls.net Fri Feb 8 10:41:22 2013 From: tim at peierls.net (Tim Peierls) Date: Fri, 8 Feb 2013 13:41:22 -0500 Subject: Refactor of Collector interface In-Reply-To: References: <511518D1.5050706@oracle.com> Message-ID: OK, throwing away the taste argument. And I don't feel completely super-great about anything, so I'm right there. On Fri, Feb 8, 2013 at 12:30 PM, Kevin Bourrillion wrote: > On Fri, Feb 8, 2013 at 9:13 AM, Tim Peierls wrote: > > My subjective sense of good Java API design very strongly prefers the >>> "before" picture here, which I see as a lot more "Java-like", so I'm taking >>> a closer look. >> >> >> The before picture is certainly more pre-lambda-Java-like, but I don't >> think it's fair to knock something meant to fit well with a new language >> feature by those rules. >> > > I think I'm only really saying the same thing Brian is when he says "While > clearly we don't want all interfaces to evolve this way..." and "while I > don't feel completely super-great about it....", etc. > > I'd prefer to not rely on the taste argument if we can treat the benefits > concretely. > > > >> >> I thought the return types of the after picture conveyed more clearly the >> idea of "I'm going to need a way to supply result objects, and way to >> accumulate elements into result objects, and a way to combine result >> objects." And seeing those interface types as return types reinforced my >> understanding of those types. >> >> >> I assume that the trade-offs we're weighing here are purely to do with >>> what it's like to be a Collector implementor, correct? >>> >> >> Well, since I persist in preferring the after picture -- maybe the >> impending blizzard has addled my senses -- I'd say the benefit to Collector >> implementers is secondary. >> >> --tim >> > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > From tim at peierls.net Fri Feb 8 12:55:00 2013 From: tim at peierls.net (Tim Peierls) Date: Fri, 8 Feb 2013 15:55:00 -0500 Subject: explode In-Reply-To: <51151E05.50908@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> <51151E05.50908@oracle.com> Message-ID: Modulo the names, seems reasonable. Don't know why the extra static method, but it doesn't wreck things for me. On Fri, Feb 8, 2013 at 10:47 AM, Brian Goetz wrote: > OK, just to put it all down on "paper" where flatMap landed...are we OK > with this? > > java.util.stream.FlatMapper: > > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > interface ToDouble { > void explodeInto(T element, DoubleConsumer sink); > } > > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > interface OfLongToLong { > void explodeInto(long element, LongConsumer sink); > } > > interface OfDoubleToDouble { > void explodeInto(double element, DoubleConsumer sink); > } > } > > In Stream: > > Stream flatMap(Function> mapper); > > Stream flatMap(FlatMapper mapper); > > IntStream flatMap(FlatMapper.ToInt mapper); > LongStream flatMap(FlatMapper.ToLong mapper); > DoubleStream flatMap(FlatMapper.ToDouble mapper); > > In IntStream (similar for {Double,Long}Stream): > > IntStream flatMap(IntFunction mapper); > IntStream flatMap(FlatMapper.OfIntToInt mapper); > > > And Remi wants one more static helper method in FlatMap: > > > public static FlatMapper > explodeCollection(Function> > function) > > > I think this wraps up the explosive section of our program? > > > > On 2/6/2013 7:05 PM, Kevin Bourrillion wrote: > >> On Wed, Feb 6, 2013 at 3:30 PM, Brian Goetz > > wrote: >> >> Stream flatMap(FlatMapper) >> >> Stream flatMap(Function>) >> >> >> To make sure I understand: would these two behave identically? Would >> they imaginably perform comparably? >> >> foos.stream().flatMap((t, consumer) -> >> t.somethingThatGivesAStream().**forEach(consumer)) >> foos.stream().flatMap(t -> t.somethingThatGivesAStream()) >> >> Second question, why "FlatMapper.OfInt" here, but "IntSupplier" etc. >> elsewhere? >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> >> > From brian.goetz at oracle.com Fri Feb 8 14:30:14 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 17:30:14 -0500 Subject: stream() / parallelStream() methods Message-ID: <51157C76.8040303@oracle.com> Currently, we define stream() and parallelStream() on Collection, with default of: default Stream stream() { return Streams.stream(() -> Streams.spliterator(iterator(), size(), Spliterator.SIZED), Spliterator.SIZED); } default Stream parallelStream() { return stream().parallel(); } So the default behavior is "get an Iterator, turn it into a Spliterator, and turn that into a Stream." Then the specific Collection classes generally override it, providing better Spliterator implementations and more precise flag sets. Several people have requested moving stream/parallelStream up to Iterable, on the theory that (a) the default implementations that would live there are not terrible (only difference between that and Collection default is Iterable doesn't know size()), (b) Collection could still override with the size-injecting version, and (c) a lot of APIs are designed to return Iterable as the "least common denominator" aggregate, and being able to stream them would be useful. I don't see any problem with moving these methods up to Iterable. Any objections? From kevinb at google.com Fri Feb 8 15:20:44 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 15:20:44 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51157C76.8040303@oracle.com> References: <51157C76.8040303@oracle.com> Message-ID: Yeah, I think we have little choice but to do this. It makes sense, and without it, Guava will just end up having to offer a static helper method to return (iterable instanceof Collection) ? ((Collection) iterable).stream() : Streams.stream(Streams.spliteratorUnknownSize(iterable.iterator()), 0). Blech. (Tangentially, I would really love to drop parallelStream() and let people call stream().parallel(). But I haven't managed to scour the archives to find if that argument's already suitably played out.) On Fri, Feb 8, 2013 at 2:30 PM, Brian Goetz wrote: > Currently, we define stream() and parallelStream() on Collection, with > default of: > > default Stream stream() { > return Streams.stream(() -> Streams.spliterator(iterator()**, > size(), Spliterator.SIZED), Spliterator.SIZED); > } > > default Stream parallelStream() { > return stream().parallel(); > } > > So the default behavior is "get an Iterator, turn it into a Spliterator, > and turn that into a Stream." Then the specific Collection classes > generally override it, providing better Spliterator implementations and > more precise flag sets. > > > Several people have requested moving stream/parallelStream up to Iterable, > on the theory that (a) the default implementations that would live there > are not terrible (only difference between that and Collection default is > Iterable doesn't know size()), (b) Collection could still override with the > size-injecting version, and (c) a lot of APIs are designed to return > Iterable as the "least common denominator" aggregate, and being able to > stream them would be useful. I don't see any problem with moving these > methods up to Iterable. > > Any objections? > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Feb 8 15:24:02 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 18:24:02 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> Message-ID: <51158912.7030005@oracle.com> > (Tangentially, I would really love to drop parallelStream() and let > people call stream().parallel(). But I haven't managed to scour the > archives to find if that argument's already suitably played out.) Direct version is more performant, in that it requires less wrapping (to turn a stream into a parallel stream, you have to first create the sequential stream, then transfer ownership of its state into a new Stream.) But, inconsistently, we have dropped a number of parallel stream factories along the same lines, because the 2x explosion of intGenerator/parallelIntGenerator was too much. But considering this is just one new method in Iterable/Collection, and it does make a difference in a common case, the status quo does seem reasonable. From kevinb at google.com Fri Feb 8 15:23:10 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 15:23:10 -0800 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> Message-ID: Can we make our best attempt to specify Iterable.stream() better than Iterable.iterator() was? I haven't worked out how to *say* this yet, but the idea is: - If at all possible to ensure that each call to stream() returns an actual working and *independent* stream, you really really should do that. - If that's just not possible, the second call to stream() really really should throw ISE. (Yes, I do realize most Iterables by far will just inherit stream(), so it will only be as repeat-usable as iterator() is.) On Fri, Feb 8, 2013 at 3:20 PM, Kevin Bourrillion wrote: > Yeah, I think we have little choice but to do this. It makes sense, and > without it, Guava will just end up having to offer a static helper method > to return (iterable instanceof Collection) ? ((Collection) > iterable).stream() : > Streams.stream(Streams.spliteratorUnknownSize(iterable.iterator()), 0). > Blech. > > (Tangentially, I would really love to drop parallelStream() and let people > call stream().parallel(). But I haven't managed to scour the archives to > find if that argument's already suitably played out.) > > > > On Fri, Feb 8, 2013 at 2:30 PM, Brian Goetz wrote: > >> Currently, we define stream() and parallelStream() on Collection, with >> default of: >> >> default Stream stream() { >> return Streams.stream(() -> Streams.spliterator(iterator()**, >> size(), Spliterator.SIZED), Spliterator.SIZED); >> } >> >> default Stream parallelStream() { >> return stream().parallel(); >> } >> >> So the default behavior is "get an Iterator, turn it into a Spliterator, >> and turn that into a Stream." Then the specific Collection classes >> generally override it, providing better Spliterator implementations and >> more precise flag sets. >> >> >> Several people have requested moving stream/parallelStream up to >> Iterable, on the theory that (a) the default implementations that would >> live there are not terrible (only difference between that and Collection >> default is Iterable doesn't know size()), (b) Collection could still >> override with the size-injecting version, and (c) a lot of APIs are >> designed to return Iterable as the "least common denominator" aggregate, >> and being able to stream them would be useful. I don't see any problem >> with moving these methods up to Iterable. >> >> Any objections? >> >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Fri Feb 8 15:25:19 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 15:25:19 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51158912.7030005@oracle.com> References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> Message-ID: On Fri, Feb 8, 2013 at 3:24 PM, Brian Goetz wrote: (Tangentially, I would really love to drop parallelStream() and let >> people call stream().parallel(). But I haven't managed to scour the >> archives to find if that argument's already suitably played out.) >> > > Direct version is more performant, in that it requires less wrapping (to > turn a stream into a parallel stream, you have to first create the > sequential stream, then transfer ownership of its state into a new Stream.) > But really a lot of *work* has already happened by then? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Feb 8 15:28:22 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 18:28:22 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> Message-ID: <51158A16.4000703@oracle.com> Depends how seriously you are counting. Doug counts individual object creations and virtual invocations on the way to a parallel operation, because until you start forking, you're on the wrong side of Amdahl's law -- this is all "serial fraction" that happens before you can fork any work, which pushes your breakeven threshold further out. So getting the setup path for parallel ops fast is valuable. On 2/8/2013 6:25 PM, Kevin Bourrillion wrote: > On Fri, Feb 8, 2013 at 3:24 PM, Brian Goetz > wrote: > > (Tangentially, I would really love to drop parallelStream() and let > people call stream().parallel(). But I haven't managed to scour the > archives to find if that argument's already suitably played out.) > > > Direct version is more performant, in that it requires less wrapping > (to turn a stream into a parallel stream, you have to first create > the sequential stream, then transfer ownership of its state into a > new Stream.) > > > But really a lot of /work/ has already happened by then? > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From kevinb at google.com Fri Feb 8 15:28:46 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 15:28:46 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51157C76.8040303@oracle.com> References: <51157C76.8040303@oracle.com> Message-ID: Here's the other issue this raises. To my knowledge there's no Streamable interface defined. Maybe it wasn't needed; I'm not sure. But once Iterable looks like this, now Iterable becomes the new Streamable. If you support a stream(), you'll implement Iterable to expose that fact. This is a little bit weird. I'm undecided on how big a problem it would be, but overall, Streamable seems like a pretty normal thing to have. On Fri, Feb 8, 2013 at 2:30 PM, Brian Goetz wrote: > Currently, we define stream() and parallelStream() on Collection, with > default of: > > default Stream stream() { > return Streams.stream(() -> Streams.spliterator(iterator()**, > size(), Spliterator.SIZED), Spliterator.SIZED); > } > > default Stream parallelStream() { > return stream().parallel(); > } > > So the default behavior is "get an Iterator, turn it into a Spliterator, > and turn that into a Stream." Then the specific Collection classes > generally override it, providing better Spliterator implementations and > more precise flag sets. > > > Several people have requested moving stream/parallelStream up to Iterable, > on the theory that (a) the default implementations that would live there > are not terrible (only difference between that and Collection default is > Iterable doesn't know size()), (b) Collection could still override with the > size-injecting version, and (c) a lot of APIs are designed to return > Iterable as the "least common denominator" aggregate, and being able to > stream them would be useful. I don't see any problem with moving these > methods up to Iterable. > > Any objections? > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Feb 8 15:32:08 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 18:32:08 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> Message-ID: <51158AF8.5010509@oracle.com> > Here's the other issue this raises. > > To my knowledge there's no Streamable interface defined. Right. Earlier drafts had one (ask Doug to recount the "OMG so many interfaces" horror of Iteration 2), and since then we've been working really hard to eliminate each incremental public type, as each adds API surface area. I think we've been really successful at this; I'd hate to slide backwards. > Maybe it > wasn't needed; I'm not sure. But once Iterable looks like this, now > Iterable becomes the new Streamable. If you support a stream(), you'll > implement Iterable to expose that fact. This is a little bit weird. > I'm undecided on how big a problem it would be, but overall, > Streamable seems like a pretty normal thing to have. Leading question: if everything that is Iterable is effectively Streamable (because Iterable has a stream()) method, and everything Streamable is effectively Iterable (because you can turn a Spliterator into an Iterator), aren't they then the same abstraction? From kevinb at google.com Fri Feb 8 15:35:08 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 15:35:08 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51158A16.4000703@oracle.com> References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> Message-ID: Doug, I am extraordinarily unmoved by this concern. :-) Does a break-even point moving a few elements in either direction really matter? On Fri, Feb 8, 2013 at 3:28 PM, Brian Goetz wrote: Depends how seriously you are counting. Doug counts individual object > creations and virtual invocations on the way to a parallel operation, > because until you start forking, you're on the wrong side of Amdahl's law > -- this is all "serial fraction" that happens before you can fork any work, > which pushes your breakeven threshold further out. So getting the setup > path for parallel ops fast is valuable. > > > On 2/8/2013 6:25 PM, Kevin Bourrillion wrote: > >> On Fri, Feb 8, 2013 at 3:24 PM, Brian Goetz > > wrote: >> >> (Tangentially, I would really love to drop parallelStream() and >> let >> people call stream().parallel(). But I haven't managed to scour >> the >> archives to find if that argument's already suitably played out.) >> >> >> Direct version is more performant, in that it requires less wrapping >> (to turn a stream into a parallel stream, you have to first create >> the sequential stream, then transfer ownership of its state into a >> new Stream.) >> >> >> But really a lot of /work/ has already happened by then? >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> >> > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Fri Feb 8 15:39:06 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 15:39:06 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51158AF8.5010509@oracle.com> References: <51157C76.8040303@oracle.com> <51158AF8.5010509@oracle.com> Message-ID: On Fri, Feb 8, 2013 at 3:32 PM, Brian Goetz wrote: Here's the other issue this raises. >> >> To my knowledge there's no Streamable interface defined. >> > > Right. Earlier drafts had one (ask Doug to recount the "OMG so many > interfaces" horror of Iteration 2), and since then we've been working > really hard to eliminate each incremental public type, as each adds API > surface area. I think we've been really successful at this; I'd hate to > slide backwards. > > > Maybe it >> wasn't needed; I'm not sure. But once Iterable looks like this, now >> Iterable becomes the new Streamable. If you support a stream(), you'll >> implement Iterable to expose that fact. This is a little bit weird. >> I'm undecided on how big a problem it would be, but overall, >> Streamable seems like a pretty normal thing to have. >> > > Leading question: if everything that is Iterable is effectively Streamable > (because Iterable has a stream()) method, and everything Streamable is > effectively Iterable (because you can turn a Spliterator into an Iterator), > aren't they then the same abstraction? > Yes: just making sure we really want that. If I fail in my bid to kill parallelStream() then could we at least keep it on Collection? With Iterable already growing from 1 to 3 methods, that one extra is pretty significant bulk. (Still, let's kill it entirely :-)) -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From joe.bowbeer at gmail.com Fri Feb 8 15:41:32 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Fri, 8 Feb 2013 15:41:32 -0800 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> Message-ID: This concern over reuse rings a bell. Are these the concerns that led us *not* to burden Iterable with these methods? We haven't talked about ISE in months, so we did something right :) On Fri, Feb 8, 2013 at 3:23 PM, Kevin Bourrillion wrote: > Can we make our best attempt to specify Iterable.stream() better than > Iterable.iterator() was? > > I haven't worked out how to *say* this yet, but the idea is: > > - If at all possible to ensure that each call to stream() returns an > actual working and *independent* stream, you really really should do that. > - If that's just not possible, the second call to stream() really really > should throw ISE. > > (Yes, I do realize most Iterables by far will just inherit stream(), so it > will only be as repeat-usable as iterator() is.) > > > On Fri, Feb 8, 2013 at 3:20 PM, Kevin Bourrillion wrote: > >> Yeah, I think we have little choice but to do this. It makes sense, and >> without it, Guava will just end up having to offer a static helper method >> to return (iterable instanceof Collection) ? ((Collection) >> iterable).stream() : >> Streams.stream(Streams.spliteratorUnknownSize(iterable.iterator()), 0). >> Blech. >> >> (Tangentially, I would really love to drop parallelStream() and let >> people call stream().parallel(). But I haven't managed to scour the >> archives to find if that argument's already suitably played out.) >> >> >> >> On Fri, Feb 8, 2013 at 2:30 PM, Brian Goetz wrote: >> >>> Currently, we define stream() and parallelStream() on Collection, with >>> default of: >>> >>> default Stream stream() { >>> return Streams.stream(() -> Streams.spliterator(iterator()**, >>> size(), Spliterator.SIZED), Spliterator.SIZED); >>> } >>> >>> default Stream parallelStream() { >>> return stream().parallel(); >>> } >>> >>> So the default behavior is "get an Iterator, turn it into a Spliterator, >>> and turn that into a Stream." Then the specific Collection classes >>> generally override it, providing better Spliterator implementations and >>> more precise flag sets. >>> >>> >>> Several people have requested moving stream/parallelStream up to >>> Iterable, on the theory that (a) the default implementations that would >>> live there are not terrible (only difference between that and Collection >>> default is Iterable doesn't know size()), (b) Collection could still >>> override with the size-injecting version, and (c) a lot of APIs are >>> designed to return Iterable as the "least common denominator" aggregate, >>> and being able to stream them would be useful. I don't see any problem >>> with moving these methods up to Iterable. >>> >>> Any objections? >>> >>> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com >> > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > From brian.goetz at oracle.com Fri Feb 8 15:44:34 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 08 Feb 2013 18:44:34 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> <51158AF8.5010509@oracle.com> Message-ID: <51158DE2.1060407@oracle.com> > If I fail in my bid to kill parallelStream() then could we at least keep > it on Collection? With Iterable already growing from 1 to 3 methods, > that one extra is pretty significant bulk. (Still, let's kill it > entirely :-)) I'm not sure I get this "bulking" argument. The implementation on Iterable will be a default. Let's say you're implementing an Iterable. There are two ends of the spectrum: 1. You are building a high-performance data structure. You are definitely going to want to create your own spliterators and offer the best parallel performance. So you are happy to see parallelStream(). 2. You are wrapping some other aggregates that you just want to be Iterable, so you cobble together an Iterator from whatever you've got. In which case you're likely to take the default stream/parallelStream implementations. So you don't care that Iterable has parallelStream. So at the ends, either you like it, or you're agnostic. What's in the middle that's different? I'm not seeing it. From kevinb at google.com Fri Feb 8 15:47:46 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 8 Feb 2013 15:47:46 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51158DE2.1060407@oracle.com> References: <51157C76.8040303@oracle.com> <51158AF8.5010509@oracle.com> <51158DE2.1060407@oracle.com> Message-ID: Sure, sure: it's much more about perception than specific impediment to usage. On Fri, Feb 8, 2013 at 3:44 PM, Brian Goetz wrote: > If I fail in my bid to kill parallelStream() then could we at least keep >> it on Collection? With Iterable already growing from 1 to 3 methods, >> that one extra is pretty significant bulk. (Still, let's kill it >> entirely :-)) >> > > I'm not sure I get this "bulking" argument. > > The implementation on Iterable will be a default. > > Let's say you're implementing an Iterable. There are two ends of the > spectrum: > > 1. You are building a high-performance data structure. You are > definitely going to want to create your own spliterators and offer the best > parallel performance. So you are happy to see parallelStream(). > > 2. You are wrapping some other aggregates that you just want to be > Iterable, so you cobble together an Iterator from whatever you've got. In > which case you're likely to take the default stream/parallelStream > implementations. So you don't care that Iterable has parallelStream. > > So at the ends, either you like it, or you're agnostic. What's in the > middle that's different? I'm not seeing it. > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From dl at cs.oswego.edu Sat Feb 9 04:09:32 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 09 Feb 2013 07:09:32 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> Message-ID: <51163C7C.4050509@cs.oswego.edu> On 02/08/13 18:35, Kevin Bourrillion wrote: > Doug, I am extraordinarily unmoved by this concern. :-) Does a break-even point > moving a few elements in either direction really matter? People dealing with parallel library support need some attitude adjustment about such things. On a soon-to-be-typical machine, every cycle you waste setting up parallelism costs you say 64 cycles. You would probably have had a different reaction if it required 64 object creations to start a parallel computation. That said, I'm always completely supportive of forcing implementors to work harder for the sake of better APIs, so long as the APIs do not rule out efficient implementation. So if killing parallelStream is really important, we'll find some way to turn stream().parallel() into a bit-flip or somesuch. -Doug > > > On Fri, Feb 8, 2013 at 3:28 PM, Brian Goetz > wrote: > > Depends how seriously you are counting. Doug counts individual object > creations and virtual invocations on the way to a parallel operation, > because until you start forking, you're on the wrong side of Amdahl's law -- > this is all "serial fraction" that happens before you can fork any work, > which pushes your breakeven threshold further out. So getting the setup > path for parallel ops fast is valuable. > > > On 2/8/2013 6:25 PM, Kevin Bourrillion wrote: > > On Fri, Feb 8, 2013 at 3:24 PM, Brian Goetz > __>> wrote: > > (Tangentially, I would really love to drop parallelStream() and let > people call stream().parallel(). But I haven't managed to scour the > archives to find if that argument's already suitably played out.) > > > Direct version is more performant, in that it requires less wrapping > (to turn a stream into a parallel stream, you have to first create > the sequential stream, then transfer ownership of its state into a > new Stream.) > > > But really a lot of /work/ has already happened by then? > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > > > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From kevinb at google.com Sat Feb 9 07:36:41 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Sat, 9 Feb 2013 07:36:41 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51163C7C.4050509@cs.oswego.edu> References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> Message-ID: On Sat, Feb 9, 2013 at 4:09 AM, Doug Lea
wrote: On 02/08/13 18:35, Kevin Bourrillion wrote: > >> Doug, I am extraordinarily unmoved by this concern. :-) Does a >> break-even point >> moving a few elements in either direction really matter? >> > > People dealing with parallel library support need some attitude > adjustment about such things. On a soon-to-be-typical machine, > every cycle you waste setting up parallelism costs you say 64 cycles. > You would probably have had a different reaction if it required 64 > object creations to start a parallel computation. > Well, that would also have 64x the effect on young gen GC. I still wouldn't immediately blanch at the 64 allocations. Do users really want to use parallelism to get savings *that* small? I thought we would care more about the cases in which the parallelism is a huge win, not so marginal. That said, I'm always completely supportive of forcing implementors > to work harder for the sake of better APIs, so long as the > APIs do not rule out efficient implementation. So if killing > parallelStream is really important, we'll find some way to > turn stream().parallel() into a bit-flip or somesuch. > I will stop short of trying to convince us it's "important", but I would definitely agree that if the cost is only some implementation ugliness, that shouldn't be enough to justify the method existing. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Sat Feb 9 07:42:04 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 09 Feb 2013 16:42:04 +0100 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> Message-ID: <51166E4C.1070900@univ-mlv.fr> On 02/09/2013 04:36 PM, Kevin Bourrillion wrote: > On Sat, Feb 9, 2013 at 4:09 AM, Doug Lea
> wrote: > > On 02/08/13 18:35, Kevin Bourrillion wrote: > > Doug, I am extraordinarily unmoved by this concern. :-) Does > a break-even point > moving a few elements in either direction really matter? > > > People dealing with parallel library support need some attitude > adjustment about such things. On a soon-to-be-typical machine, > every cycle you waste setting up parallelism costs you say 64 cycles. > You would probably have had a different reaction if it required 64 > object creations to start a parallel computation. > > > Well, that would also have 64x the effect on young gen GC. > > I still wouldn't immediately blanch at the 64 allocations. Do users > really want to use parallelism to get savings /that/ small? I thought > we would care more about the cases in which the parallelism is a huge > win, not so marginal. It depends if the operation that you perform for each item take a long time or not. > > > That said, I'm always completely supportive of forcing implementors > to work harder for the sake of better APIs, so long as the > APIs do not rule out efficient implementation. So if killing > parallelStream is really important, we'll find some way to > turn stream().parallel() into a bit-flip or somesuch. > > > I will stop short of trying to convince us it's "important", but I > would definitely agree that if the cost is only some implementation > ugliness, that shouldn't be enough to justify the method existing. R?mi From forax at univ-mlv.fr Sat Feb 9 07:44:34 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 09 Feb 2013 16:44:34 +0100 Subject: explode In-Reply-To: <51151E05.50908@oracle.com> References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com> <5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr> <5112E787.5090809@oracle.com> <51151E05.50908@oracle.com> Message-ID: <51166EE2.4090408@univ-mlv.fr> On 02/08/2013 04:47 PM, Brian Goetz wrote: > OK, just to put it all down on "paper" where flatMap landed...are we > OK with this? > > java.util.stream.FlatMapper: > > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > interface ToDouble { > void explodeInto(T element, DoubleConsumer sink); > } > > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > interface OfLongToLong { > void explodeInto(long element, LongConsumer sink); > } > > interface OfDoubleToDouble { > void explodeInto(double element, DoubleConsumer sink); > } > } > > In Stream: > > Stream flatMap(Function> mapper); just a wildcard issue: Stream flatMap(Function> mapper); R?mi From dl at cs.oswego.edu Sat Feb 9 07:47:28 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 09 Feb 2013 10:47:28 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> Message-ID: <51166F90.301@cs.oswego.edu> On 02/09/13 10:36, Kevin Bourrillion wrote: > I still wouldn't immediately blanch at the 64 allocations. Do users really want > to use parallelism to get savings /that/ small? I thought we would care more > about the cases in which the parallelism is a huge win, not so marginal. If you take the "what's one more cycle" point of view consistently, then it would never be worth trying to parallelize anything. So minimizing seq overhead while keeping nice APIs is the *only* success criterion. > > I will stop short of trying to convince us it's "important", but I would > definitely agree that if the cost is only some implementation ugliness, that > shouldn't be enough to justify the method existing.Here's Here's another breach in my promise not to have an opinion about anything in the Steam API: I think "parallelStream()" is much nicer than "stream().parallel()". -Doug From forax at univ-mlv.fr Sat Feb 9 07:57:11 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 09 Feb 2013 16:57:11 +0100 Subject: Refactor of Collector interface In-Reply-To: <51152975.1040305@oracle.com> References: <511518D1.5050706@oracle.com> <51152975.1040305@oracle.com> Message-ID: <511671D7.6070607@univ-mlv.fr> On 02/08/2013 05:36 PM, Brian Goetz wrote: > Your subjective sense is accurate, which is why I brought this up. > This may be an example where is better to depart from the traditional > approach. > > To your question, it depends what you mean by "purely to do with an > implementor." Collector *users* are going to be burdened with the > performance consequences of multiple layers of wrapping/conversion. > > The implementation used to be full of alternation between: > > interface Foo { > U transform(T t); > } > > class FooAdapter { > FooAdapter(Function lambda) { ... } > > U transform(T t) { return lambda.apply(t); } > } > > and > > Function parentTransformer = foo::transform; > > and back again, introducing layers of wrapping even when the function > is not changing across layers. Yes, the other problem is if we have something which is recursive, we could easily end-up with a chain of adapters as long as the number of recursive calls. This problem frequently arrives in dynamic language runtime when by example you convert from j.l.String to GroovyString and do the opposite operation. The only sane way to implement that is to provide a way to box and unbox things. So having Collector being a triple seams to be the only sane choice. R?mi > > > > On 2/8/2013 11:22 AM, Kevin Bourrillion wrote: >> My subjective sense of good Java API design very strongly prefers the >> "before" picture here, which I see as a lot more "Java-like", so I'm >> taking a closer look. >> >> I assume that the trade-offs we're weighing here are purely to do with >> what it's like to be a Collector implementor, correct? >> >> >> On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz > > wrote: >> >> FYI: In a recent refactoring, I changed: >> >> public interface Collector { >> R makeResult(); >> void accumulate(R result, T value); >> R combine(R result, R other); >> } >> >> to >> >> public interface Collector { >> Supplier resultSupplier(); >> BiConsumer accumulator(); >> BinaryOperator combiner(); >> } >> >> Basically, this is a refactoring from typical interface to >> tuple-of-lambdas. What I found was that there was a lot of >> adaptation going on, where something would start out as a lambda, >> we'd wrap it with a Collector whose method invoked the lambda, then >> take a method reference to that wrapping method and then later wrap >> that with another Collector, etc. By keeping access to the >> functions directly, the Collectors code got simpler and less wrappy, >> since a lot of functions could just be passed right through without >> wrapping. And a lot of stupid adapter classes went away. >> >> While clearly we don't want all interfaces to evolve this way, this >> is one where *all* the many layers of manipulations are effectively >> function composition, and exposing the function-ness made that >> cleaner and more performant. So while I don't feel completely >> super-great about it, I think its enough of a win to keep. >> >> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> From kevinb at google.com Sat Feb 9 08:07:56 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Sat, 9 Feb 2013 08:07:56 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51166F90.301@cs.oswego.edu> References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> <51166F90.301@cs.oswego.edu> Message-ID: Belated disclaimer: one should always read my comments on performance as "please educate me because I don't get it", not "you're all wrong". :-) On Sat, Feb 9, 2013 at 7:47 AM, Doug Lea
wrote: On 02/09/13 10:36, Kevin Bourrillion wrote: > > I still wouldn't immediately blanch at the 64 allocations. Do users >> really want >> to use parallelism to get savings /that/ small? I thought we would care >> more >> >> about the cases in which the parallelism is a huge win, not so marginal. >> > > If you take the "what's one more cycle" point of view consistently, then > it would never be worth trying to parallelize anything. So minimizing > seq overhead while keeping nice APIs is the *only* success criterion. > I will stop short of trying to convince us it's "important", but I would >> definitely agree that if the cost is only some implementation ugliness, >> that >> shouldn't be enough to justify the method existing.Here's >> > > Here's another breach in my promise not to have an opinion > about anything in the Steam API: I think "parallelStream()" > is much nicer than "stream().parallel()". But the choice isn't precisely between those two; it's between having one or both. I assume that the stream().parallel() option has to exist regardless, and so users will encounter it in code, and they *will* have to start discussions with each other about "why did you do s().p() instead of .pS(), or vice versa, and what's the difference anyway?" Then, every time someone *adds* a stream() method to their type they then face the question of whether they're supposed to add parallelStream() too, etc. I don't think absolute normalization is an API design goal in itself, but having two very similar ways to do the same thing is a definite smell. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From dl at cs.oswego.edu Sat Feb 9 08:31:17 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 09 Feb 2013 11:31:17 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> <51166F90.301@cs.oswego.edu> Message-ID: <511679D5.2090508@cs.oswego.edu> On 02/09/13 11:07, Kevin Bourrillion wrote: > But the choice isn't precisely between those two; it's between having one or > both. I assume that the stream().parallel() option has to exist regardless, and > so users will encounter it in code, and they /will/ have to start discussions > with each other about "why did you do s().p() instead of .pS(), or vice versa, > and what's the difference anyway?" Then, every time someone /adds/ a stream() > method to their type they then face the question of whether they're supposed to > add parallelStream() too, etc. > Well, I don't like the parallel() method on Stream anyway, so I'll let others take over from here... -Doug From brian.goetz at oracle.com Sat Feb 9 08:31:52 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 09 Feb 2013 11:31:52 -0500 Subject: stream() / parallelStream() methods In-Reply-To: <51166F90.301@cs.oswego.edu> References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> <51166F90.301@cs.oswego.edu> Message-ID: <511679F8.1040407@oracle.com> > Here's another breach in my promise not to have an opinion > about anything in the Steam API: I think "parallelStream()" > is much nicer than "stream().parallel()". I do too, but I also recognize that is mostly just taste and we could get used to either. But, let's turn the question around, because we have an inconsistent API right now with respect to stream constructors, and we should decide whether we want to choose that deliberately (which I think is fine), or go one way or the other. We have a number of factories in Streams like: Streams.intRange(from, to) Streams.generate(T, UnaryOperator) We do *not* have explicit parallel versions of each of these; we did originally, and to prune down the API surface area, we cut them on the theory that dropping 20+ methods from the API was worth the tradeoff of the surface yuckiness and performance cost of .intRange(...).parallel(). But we did not make that choice with Collection. We could either remove the Collection.parallelStream(), or we could add the parallel versions of all the generators, or we could do nothing and leave it as is. I think all are justifiable on API design grounds. I kind of like the status quo, despite its inconsistency. Instead of having 2N stream construction methods, we have N+1 -- but that extra 1 covers a huge number of cases, because it is inherited by every Collection. So I can justify to myself why having that extra 1 method is worth it, and why accepting the inconsistency of going no further is acceptable. Do others disagree? Is N+1 the practical choice here? Or should we go for the purity of N? Or the convenience and consistency of 2N? Or is there some even better N+3, for some other specially chosen cases we want to give special support to? From brian.goetz at oracle.com Sat Feb 9 08:41:35 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 09 Feb 2013 11:41:35 -0500 Subject: stream() / parallelStream() methods In-Reply-To: <511679D5.2090508@cs.oswego.edu> References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> <51166F90.301@cs.oswego.edu> <511679D5.2090508@cs.oswego.edu> Message-ID: <51167C3F.1000704@oracle.com> > Well, I don't like the parallel() method on Stream anyway, so I'll > let others take over from here... You can't drop a bomb like that and walk away! You have to explain why you don't like it, because I suspect most people's first guess about why will be wrong. I'll take my best stab at explaining why: because it (like the stateful methods (sort, distinct, limit)) which you also don't like, move us incrementally farther from being able to express stream pipelines in terms of traditional data-parallel constructs, which further constrains our ability to to map them directly to tomorrow's computing substrate, whether that be vector processors, FPGAs, GPUs, or whatever we cook up. Filter-map-reduce map very cleanly to all sorts of parallel computing substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce does not. So the whole API design here embodies many tensions between making it easy to express things the user is likely to want to express, and doing is in a manner that we can predictably make fast with transparent cost models. From dl at cs.oswego.edu Sat Feb 9 08:49:17 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 09 Feb 2013 11:49:17 -0500 Subject: stream() / parallelStream() methods In-Reply-To: <51167C3F.1000704@oracle.com> References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> <51166F90.301@cs.oswego.edu> <511679D5.2090508@cs.oswego.edu> <51167C3F.1000704@oracle.com> Message-ID: <51167E0D.8060202@cs.oswego.edu> On 02/09/13 11:41, Brian Goetz wrote: > You can't drop a bomb like that and walk away! You have to explain why you > don't like it, because I suspect most people's first guess about why will be wrong. > > I'll take my best stab at explaining why: Yes, thanks. Stateful Stream methods are clearly problematic. Most people like them anyway because they are convenient. And in any case, whenever they show up, many API discussions follow. > because it (like the stateful methods > (sort, distinct, limit)) which you also don't like, move us incrementally > farther from being able to express stream pipelines in terms of traditional > data-parallel constructs, which further constrains our ability to to map them > directly to tomorrow's computing substrate, whether that be vector processors, > FPGAs, GPUs, or whatever we cook up. > > Filter-map-reduce map very cleanly to all sorts of parallel computing > substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce > does not. > > So the whole API design here embodies many tensions between making it easy to > express things the user is likely to want to express, and doing is in a manner > that we can predictably make fast with transparent cost models. > From joe.bowbeer at gmail.com Sat Feb 9 10:13:59 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 9 Feb 2013 10:13:59 -0800 Subject: stream() / parallelStream() methods In-Reply-To: <51167E0D.8060202@cs.oswego.edu> References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> <51166F90.301@cs.oswego.edu> <511679D5.2090508@cs.oswego.edu> <51167C3F.1000704@oracle.com> <51167E0D.8060202@cs.oswego.edu> Message-ID: I'm OK with parallelStream(). It did raise a question when I used it for the first time, but it was also easy to find in the IDE. I wanted "parallel" and knew what I was getting into; as opposed to someone splicing a parallel() into their expression as an afterthought.. The separation of parallel() from stream() also presents more possibilities for the user, and therefore also raises questions. Where in the expression does parallel() belong? In the parallel string-compare example, I had a choice between boxed().parallel() or parallel().boxed(). Which is "right"? Or maybe I should insert parallel() even later in the expression? On 02/09/13 11:41, Brian Goetz wrote: You can't drop a bomb like that and walk away! You have to explain why you > don't like it, because I suspect most people's first guess about why will > be wrong. > > I'll take my best stab at explaining why: > Yes, thanks. Stateful Stream methods are clearly problematic. Most people like them anyway because they are convenient. And in any case, whenever they show up, many API discussions follow. because it (like the stateful methods > (sort, distinct, limit)) which you also don't like, move us incrementally > farther from being able to express stream pipelines in terms of traditional > data-parallel constructs, which further constrains our ability to to map > them > directly to tomorrow's computing substrate, whether that be vector > processors, > FPGAs, GPUs, or whatever we cook up. > > Filter-map-reduce map very cleanly to all sorts of parallel computing > substrates; filter-parallel-map-**sequential-sorted-limit-** > parallel-map-uniq-reduce > does not. > > So the whole API design here embodies many tensions between making it easy > to > express things the user is likely to want to express, and doing is in a > manner > that we can predictably make fast with transparent cost models. > > From tim at peierls.net Sat Feb 9 10:22:42 2013 From: tim at peierls.net (Tim Peierls) Date: Sat, 9 Feb 2013 13:22:42 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> <51166F90.301@cs.oswego.edu> <511679D5.2090508@cs.oswego.edu> <51167C3F.1000704@oracle.com> <51167E0D.8060202@cs.oswego.edu> Message-ID: On Sat, Feb 9, 2013 at 1:13 PM, Joe Bowbeer wrote: > The separation of parallel() from stream() also presents more > possibilities for the user, and therefore also raises questions. Where in > the expression does parallel() belong? In the parallel string-compare > example, I had a choice between boxed().parallel() or parallel().boxed(). > Which is "right"? Or maybe I should insert parallel() even later in the > expression? > Yup, that's the sort of uncertainty that really slows me down. All those choices to make, especially when the type system doesn't yield clues. I'd rather give up on clean ways to express certain intricate (and uncommon?) combinations than have to make choices like these. --tim From brian.goetz at oracle.com Sat Feb 9 10:48:49 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 09 Feb 2013 13:48:49 -0500 Subject: stream() / parallelStream() methods In-Reply-To: References: <51157C76.8040303@oracle.com> <51158912.7030005@oracle.com> <51158A16.4000703@oracle.com> <51163C7C.4050509@cs.oswego.edu> <51166F90.301@cs.oswego.edu> <511679D5.2090508@cs.oswego.edu> <51167C3F.1000704@oracle.com> <51167E0D.8060202@cs.oswego.edu> Message-ID: <51169A11.1050903@oracle.com> > The separation of parallel() from stream() also presents more > possibilities for the user, and therefore also raises questions. Where > in the expression does parallel() belong? In the parallel string-compare > example, I had a choice between boxed().parallel() or > parallel().boxed(). Which is "right"? Or maybe I should insert > parallel() even later in the expression? Good question. Clearly more education will be needed here. There's two axes on which to evaluate how to use .parallel() and .sequential(); semantic and performance. The semantics are straightforward. If a stream starts out sequential, then: foo.filter(...).parallel().map(...) will do the filtering sequentially and the mapping in parallel. Whereas foo.parallel().filter(...).map(...) will do both in parallel. I think users can understand that aspect of it; it seems pretty straightforward. If the stream is already s/p then s()/p() are no-ops (well, a single virtual call and a field read.) On the performance front, that's always a moving target, but currently .parallel() on a "naked" (no ops added yet, as in the second case) stream is much cheaper than .parallel() on a stream that already has ops (like in the first case.) From sam at sampullara.com Sat Feb 9 11:26:25 2013 From: sam at sampullara.com (Sam Pullara) Date: Sat, 9 Feb 2013 11:26:25 -0800 Subject: Internal and External truncation conditions Message-ID: Now that we are further along, I wanted to bring this up again. I don't think that forEachUntil is sufficient for handling internal and external conditions that should truncate stream processing. I've also looked at CloseableStream and that doesn't appear to help since it isn't possible to wrap a Stream (say an infinite stream) with a CloseableStream and get the necessary semantics of cancellation. Also, other APIs that don't consider that you might give them a CloseableStream will likely still give you back a Stream thus losing the semantics. Is everyone else happy with forEachUntil and CloseableStream? Sam ---------- Forwarded message ---------- From: Sam Pullara Date: Mon, Dec 31, 2012 at 8:34 AM Subject: Re: Cancelation -- use cases To: Brian Goetz Cc: "lambda-libs-spec-experts at openjdk.java.net" I think we are conflating two things with this solution and it doesn't work for them in my mind. Here is what I would like the solution to cover: - External conditions (cancellation, cleanup) - Internal conditions (gating based on count, elements and results) The first one may be the only one that works in the parallel case. It should likely be implemented with .close() on stream that would stop the stream as soon as possible. This would be useful for things like timeouts. Kind of like calling close on an inputstream in the middle of reading it. The other one I think is necessary and hard to implement correctly with the parallel case. For instance I would like to say: stream.gate(e -> e < 10).forEach(e -> ?) OR stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the number of the current element That should give me every element in the stream until an element isn't < 10 and then stop processing elements. Further, there should be some way for the stream source to be notified that we are done consuming it in case it is of unknown length or consumes resources. That would be more like (assuming we add a Runnable call to Timer): Stream stream = ?. new Timer().schedule(() -> stream.close(), 5000); stream.forEach(e -> ?.); OR stream.forEach(e -> try { ? } catch() { stream.close() } ); Sadly, the first gate() case doesn't work well when parallelized. I'm willing to just specify what the behavior is for that case to get it into the API. For example, I would probably say something like "the gate will need to return false once per split to stop processing". In either of these cases I think one of the motivations needs to be that the stream may be using resources and we need to tell the source that we are done consuming it. For example, if the stream is sourced from a file, database or even a large amount of memory there should be a notification mechanism for doneness that will allow those resources to be returned before the stream is exhausted. To that end I think that Stream should implement AutoCloseable but overridden with no checked exception. interface Stream implements AutoCloseable { /** * Closes this stream and releases any system resources associated * with it. If the stream is already closed then invoking this * method has no effect. Close is automatically called when the * stream is exhausted. After this is called, no further elements * will be processed by the stream but currently processing elements * will complete normally. Calling other methods on a closed stream will * produce IllegalStateExceptions. */ void close(); /** * When the continueProcessing function returns false, no further * elements will be processed after the gate. In the parallel stream * case no further elements will be processed in the current split. */ Stream gate(Function until); /** * As gate with the addition of the current element number. */ Stream gate(BiFunction until); } This API avoids a lot of side effects that forEachUntil would require implement these use cases. Sam On Dec 30, 2012, at 7:53 PM, Brian Goetz wrote: Here's a lower-complexity version of cancel, that still satisfies (in series or in parallel) use cases like the following: > - Find the best possible move after thinking for 5 seconds > - Find the first solution that is better than X > - Gather solutions until we have 100 of them without bringing in the complexity or time/space overhead of dealing with encounter order. Since the forEach() operation works exclusively on the basis of temporal/arrival order rather than spatial/encounter order (elements are passed to the lambda in whatever order they are available, in whatever thread they are available), we could make a canceling variant of forEach: .forEachUntil(Block sink, BooleanSupplier until) Here, there is no confusion about what happens in the ordered case, no need to buffer elements, etc. Elements flow into the block until the termination condition transpires, at which point there are no more splits and existing splits dispense no more elements. I implemented this (it was trivial) and wrote a simple test program to calculate primes sequentially and in parallel, counting how many could be calculated in a fixed amount of time, starting from an infinite generator and filtering out composites: Streams.iterate(from, i -> i + 1) // sequential .filter(i -> isPrime(i)) .forEachUntil(i -> { chm.put(i, true); }, () -> System.currentTimeMillis() >= start+num); vs Streams.iterate(from, i -> i+1) // parallel .parallel() .filter(i -> isPrime(i)) .forEachUntil(i -> { chm.put(i, true); }, () -> System.currentTimeMillis() >= start+num); On a 4-core Q6600 system, in a fixed amount of time, the parallel version gathered ~3x as many primes. In terms of being able to perform useful computations on infinite streams, this seems a pretty attractive price-performer; lower spec and implementation complexity, and covers many of the use cases which would otherwise be impractical to attack with the stream approach. On 12/28/2012 11:20 AM, Brian Goetz wrote: I've been working through some alternatives for cancellation support in infinite streams. Looking to gather some use case background to help evaluate the alternatives. In the serial case, the "gate" approach works fine -- after some criteria transpires, stop sending elements downstream. The pipeline flushes the elements it has, and completes early. In the parallel unordered case, the gate approach similarly works fine -- after the cancelation criteria occurs, no new splits are created, and existing splits dispense no more elements. The computation similarly quiesces after elements currently being processed are completed, possibly along with any up-tree merging to combine results. It is the parallel ordered case that is tricky. Supposing we partition a stream into (a1,a2,a3), (a4,a5,a6) And suppose further we happen to be processing a5 when the bell goes off. Do we want to wait for all a_i, i<5, to finish before letting the computation quiesce? My gut says: for the things we intend to cancel, most of them will be order-insensitive anyway. Things like: - Find the best possible move after thinking for 5 seconds - Find the first solution that is better than X - Gather solutions until we have 100 of them I believe the key use case for cancelation here will be when we are chewing on potentially infinite streams of events (probably backed by IO) where we want to chew until we're asked to shut down, and want to get as much parallelism as we can cheaply. Which suggests to me the intersection between order-sensitive stream pipelines and cancelable stream pipelines is going to be pretty small indeed. Anyone want to add to this model of use cases for cancelation? From joe.bowbeer at gmail.com Sat Feb 9 11:36:59 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 9 Feb 2013 11:36:59 -0800 Subject: Internal and External truncation conditions In-Reply-To: References: Message-ID: I haven't used either of these. If I wanted to create an example I'd probably start with a stream of lines() from a BufferedReader, and then tack on a use case. try (BufferedReader r = Files.newBufferedReader(path, Charset.defaultCharset())) { r.lines().forEachUntil(...);} Do you have something specific in mind? On Sat, Feb 9, 2013 at 11:26 AM, Sam Pullara wrote: > Now that we are further along, I wanted to bring this up again. I > don't think that forEachUntil is sufficient for handling internal and > external conditions that should truncate stream processing. I've also > looked at CloseableStream and that doesn't appear to help since it > isn't possible to wrap a Stream (say an infinite stream) with a > CloseableStream and get the necessary semantics of cancellation. Also, > other APIs that don't consider that you might give them a > CloseableStream will likely still give you back a Stream thus losing > the semantics. > > Is everyone else happy with forEachUntil and CloseableStream? > > Sam > > ---------- Forwarded message ---------- > From: Sam Pullara > Date: Mon, Dec 31, 2012 at 8:34 AM > Subject: Re: Cancelation -- use cases > To: Brian Goetz > Cc: "lambda-libs-spec-experts at openjdk.java.net" > > > I think we are conflating two things with this solution and it doesn't > work for them in my mind. Here is what I would like the solution to > cover: > > - External conditions (cancellation, cleanup) > - Internal conditions (gating based on count, elements and results) > > The first one may be the only one that works in the parallel case. It > should likely be implemented with .close() on stream that would stop > the stream as soon as possible. This would be useful for things like > timeouts. Kind of like calling close on an inputstream in the middle > of reading it. The other one I think is necessary and hard to > implement correctly with the parallel case. For instance I would like > to say: > > stream.gate(e -> e < 10).forEach(e -> ?) > > OR > > stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the > number of the current element > > That should give me every element in the stream until an element isn't > < 10 and then stop processing elements. Further, there should be some > way for the stream source to be notified that we are done consuming it > in case it is of unknown length or consumes resources. That would be > more like (assuming we add a Runnable call to Timer): > > Stream stream = ?. > new Timer().schedule(() -> stream.close(), 5000); > stream.forEach(e -> ?.); > > OR > > stream.forEach(e -> try { ? } catch() { stream.close() } ); > > Sadly, the first gate() case doesn't work well when parallelized. I'm > willing to just specify what the behavior is for that case to get it > into the API. For example, I would probably say something like "the > gate will need to return false once per split to stop processing". In > either of these cases I think one of the motivations needs to be that > the stream may be using resources and we need to tell the source that > we are done consuming it. For example, if the stream is sourced from a > file, database or even a large amount of memory there should be a > notification mechanism for doneness that will allow those resources to > be returned before the stream is exhausted. To that end I think that > Stream should implement AutoCloseable but overridden with no checked > exception. > > interface Stream implements AutoCloseable { > /** > * Closes this stream and releases any system resources associated > * with it. If the stream is already closed then invoking this > * method has no effect. Close is automatically called when the > * stream is exhausted. After this is called, no further elements > * will be processed by the stream but currently processing elements > * will complete normally. Calling other methods on a closed stream will > * produce IllegalStateExceptions. > */ > void close(); > > /** > * When the continueProcessing function returns false, no further > * elements will be processed after the gate. In the parallel stream > * case no further elements will be processed in the current split. > */ > Stream gate(Function until); > > /** > * As gate with the addition of the current element number. > */ > Stream gate(BiFunction until); > } > > This API avoids a lot of side effects that forEachUntil would require > implement these use cases. > > Sam > > On Dec 30, 2012, at 7:53 PM, Brian Goetz wrote: > > Here's a lower-complexity version of cancel, that still satisfies (in > series or in parallel) use cases like the following: > > > - Find the best possible move after thinking for 5 seconds > > - Find the first solution that is better than X > > - Gather solutions until we have 100 of them > > without bringing in the complexity or time/space overhead of dealing > with encounter order. > > Since the forEach() operation works exclusively on the basis of > temporal/arrival order rather than spatial/encounter order (elements > are passed to the lambda in whatever order they are available, in > whatever thread they are available), we could make a canceling variant > of forEach: > > .forEachUntil(Block sink, BooleanSupplier until) > > Here, there is no confusion about what happens in the ordered case, no > need to buffer elements, etc. Elements flow into the block until the > termination condition transpires, at which point there are no more > splits and existing splits dispense no more elements. > > I implemented this (it was trivial) and wrote a simple test program to > calculate primes sequentially and in parallel, counting how many could > be calculated in a fixed amount of time, starting from an infinite > generator and filtering out composites: > > Streams.iterate(from, i -> i + 1) // sequential > .filter(i -> isPrime(i)) > .forEachUntil(i -> { > chm.put(i, true); > }, () -> System.currentTimeMillis() >= start+num); > > vs > > Streams.iterate(from, i -> i+1) // parallel > .parallel() > .filter(i -> isPrime(i)) > .forEachUntil(i -> { > chm.put(i, true); > }, () -> System.currentTimeMillis() >= start+num); > > On a 4-core Q6600 system, in a fixed amount of time, the parallel > version gathered ~3x as many primes. > > In terms of being able to perform useful computations on infinite > streams, this seems a pretty attractive price-performer; lower spec > and implementation complexity, and covers many of the use cases which > would otherwise be impractical to attack with the stream approach. > > > > On 12/28/2012 11:20 AM, Brian Goetz wrote: > > I've been working through some alternatives for cancellation support in > infinite streams. Looking to gather some use case background to help > evaluate the alternatives. > > In the serial case, the "gate" approach works fine -- after some > criteria transpires, stop sending elements downstream. The pipeline > flushes the elements it has, and completes early. > > In the parallel unordered case, the gate approach similarly works fine > -- after the cancelation criteria occurs, no new splits are created, and > existing splits dispense no more elements. The computation similarly > quiesces after elements currently being processed are completed, > possibly along with any up-tree merging to combine results. > > It is the parallel ordered case that is tricky. Supposing we partition > a stream into > (a1,a2,a3), (a4,a5,a6) > > And suppose further we happen to be processing a5 when the bell goes > off. Do we want to wait for all a_i, i<5, to finish before letting the > computation quiesce? > > My gut says: for the things we intend to cancel, most of them will be > order-insensitive anyway. Things like: > > - Find the best possible move after thinking for 5 seconds > - Find the first solution that is better than X > - Gather solutions until we have 100 of them > > I believe the key use case for cancelation here will be when we are > chewing on potentially infinite streams of events (probably backed by > IO) where we want to chew until we're asked to shut down, and want to > get as much parallelism as we can cheaply. Which suggests to me the > intersection between order-sensitive stream pipelines and cancelable > stream pipelines is going to be pretty small indeed. > > Anyone want to add to this model of use cases for cancelation? > From sam at sampullara.com Sat Feb 9 11:55:04 2013 From: sam at sampullara.com (Sam Pullara) Date: Sat, 9 Feb 2013 11:55:04 -0800 Subject: Internal and External truncation conditions In-Reply-To: References: Message-ID: Let's say you only want to process lines until it matches a regex. Here is one way you could try to implement it: AtomicBoolean done = new AtomicBoolean(); Pattern regex = ...; try (BufferedReader r = Files.newBufferedReader(path, Charset.defaultCharset())) { r.lines().forEachUntil( (line) -> { if (!done.get()) { if (regex.matcher(line).matches()) { done.set(true); } else { ...process the line... } } }, done::get); } In the parallel case this completely breaks down since the lines can be processed out of order. Gate would have to ensure that didn't happen. Pattern regex = ...; try (BufferedReader r = Files.newBufferedReader(path, Charset.defaultCharset())) { r.lines().gate(() -> !regex.matcher(line).matches()).forEach((line) -> .. process line ..); } If we wanted to cancel the operation asynchronously we would just call Stream.close() and that should work with any stream. In the current case we can't stop a stream from continuing to execute without explicitly adding a forEachUntil() at the end of it with a condition variable that we then change out of band. Also, since there may be reduction operations in the middle, that may not even stop all of those operations from completing. This can be especially bad for things that should timeout. Sam On Sat, Feb 9, 2013 at 11:36 AM, Joe Bowbeer wrote: > I haven't used either of these. If I wanted to create an example I'd > probably start with a stream of lines() from a BufferedReader, and then tack > on a use case. > > try (BufferedReader r = Files.newBufferedReader(path, > Charset.defaultCharset())) { > r.lines().forEachUntil(...); > } > > > Do you have something specific in mind? > > > On Sat, Feb 9, 2013 at 11:26 AM, Sam Pullara wrote: >> >> Now that we are further along, I wanted to bring this up again. I >> don't think that forEachUntil is sufficient for handling internal and >> external conditions that should truncate stream processing. I've also >> looked at CloseableStream and that doesn't appear to help since it >> isn't possible to wrap a Stream (say an infinite stream) with a >> CloseableStream and get the necessary semantics of cancellation. Also, >> other APIs that don't consider that you might give them a >> CloseableStream will likely still give you back a Stream thus losing >> the semantics. >> >> Is everyone else happy with forEachUntil and CloseableStream? >> >> Sam >> >> ---------- Forwarded message ---------- >> From: Sam Pullara >> Date: Mon, Dec 31, 2012 at 8:34 AM >> Subject: Re: Cancelation -- use cases >> To: Brian Goetz >> Cc: "lambda-libs-spec-experts at openjdk.java.net" >> >> >> I think we are conflating two things with this solution and it doesn't >> work for them in my mind. Here is what I would like the solution to >> cover: >> >> - External conditions (cancellation, cleanup) >> - Internal conditions (gating based on count, elements and results) >> >> The first one may be the only one that works in the parallel case. It >> should likely be implemented with .close() on stream that would stop >> the stream as soon as possible. This would be useful for things like >> timeouts. Kind of like calling close on an inputstream in the middle >> of reading it. The other one I think is necessary and hard to >> implement correctly with the parallel case. For instance I would like >> to say: >> >> stream.gate(e -> e < 10).forEach(e -> ?) >> >> OR >> >> stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the >> number of the current element >> >> That should give me every element in the stream until an element isn't >> < 10 and then stop processing elements. Further, there should be some >> way for the stream source to be notified that we are done consuming it >> in case it is of unknown length or consumes resources. That would be >> more like (assuming we add a Runnable call to Timer): >> >> Stream stream = ?. >> new Timer().schedule(() -> stream.close(), 5000); >> stream.forEach(e -> ?.); >> >> OR >> >> stream.forEach(e -> try { ? } catch() { stream.close() } ); >> >> Sadly, the first gate() case doesn't work well when parallelized. I'm >> willing to just specify what the behavior is for that case to get it >> into the API. For example, I would probably say something like "the >> gate will need to return false once per split to stop processing". In >> either of these cases I think one of the motivations needs to be that >> the stream may be using resources and we need to tell the source that >> we are done consuming it. For example, if the stream is sourced from a >> file, database or even a large amount of memory there should be a >> notification mechanism for doneness that will allow those resources to >> be returned before the stream is exhausted. To that end I think that >> Stream should implement AutoCloseable but overridden with no checked >> exception. >> >> interface Stream implements AutoCloseable { >> /** >> * Closes this stream and releases any system resources associated >> * with it. If the stream is already closed then invoking this >> * method has no effect. Close is automatically called when the >> * stream is exhausted. After this is called, no further elements >> * will be processed by the stream but currently processing elements >> * will complete normally. Calling other methods on a closed stream will >> * produce IllegalStateExceptions. >> */ >> void close(); >> >> /** >> * When the continueProcessing function returns false, no further >> * elements will be processed after the gate. In the parallel stream >> * case no further elements will be processed in the current split. >> */ >> Stream gate(Function until); >> >> /** >> * As gate with the addition of the current element number. >> */ >> Stream gate(BiFunction until); >> } >> >> This API avoids a lot of side effects that forEachUntil would require >> implement these use cases. >> >> Sam >> >> On Dec 30, 2012, at 7:53 PM, Brian Goetz wrote: >> >> Here's a lower-complexity version of cancel, that still satisfies (in >> series or in parallel) use cases like the following: >> >> > - Find the best possible move after thinking for 5 seconds >> > - Find the first solution that is better than X >> > - Gather solutions until we have 100 of them >> >> without bringing in the complexity or time/space overhead of dealing >> with encounter order. >> >> Since the forEach() operation works exclusively on the basis of >> temporal/arrival order rather than spatial/encounter order (elements >> are passed to the lambda in whatever order they are available, in >> whatever thread they are available), we could make a canceling variant >> of forEach: >> >> .forEachUntil(Block sink, BooleanSupplier until) >> >> Here, there is no confusion about what happens in the ordered case, no >> need to buffer elements, etc. Elements flow into the block until the >> termination condition transpires, at which point there are no more >> splits and existing splits dispense no more elements. >> >> I implemented this (it was trivial) and wrote a simple test program to >> calculate primes sequentially and in parallel, counting how many could >> be calculated in a fixed amount of time, starting from an infinite >> generator and filtering out composites: >> >> Streams.iterate(from, i -> i + 1) // sequential >> .filter(i -> isPrime(i)) >> .forEachUntil(i -> { >> chm.put(i, true); >> }, () -> System.currentTimeMillis() >= start+num); >> >> vs >> >> Streams.iterate(from, i -> i+1) // parallel >> .parallel() >> .filter(i -> isPrime(i)) >> .forEachUntil(i -> { >> chm.put(i, true); >> }, () -> System.currentTimeMillis() >= start+num); >> >> On a 4-core Q6600 system, in a fixed amount of time, the parallel >> version gathered ~3x as many primes. >> >> In terms of being able to perform useful computations on infinite >> streams, this seems a pretty attractive price-performer; lower spec >> and implementation complexity, and covers many of the use cases which >> would otherwise be impractical to attack with the stream approach. >> >> >> >> On 12/28/2012 11:20 AM, Brian Goetz wrote: >> >> I've been working through some alternatives for cancellation support in >> infinite streams. Looking to gather some use case background to help >> evaluate the alternatives. >> >> In the serial case, the "gate" approach works fine -- after some >> criteria transpires, stop sending elements downstream. The pipeline >> flushes the elements it has, and completes early. >> >> In the parallel unordered case, the gate approach similarly works fine >> -- after the cancelation criteria occurs, no new splits are created, and >> existing splits dispense no more elements. The computation similarly >> quiesces after elements currently being processed are completed, >> possibly along with any up-tree merging to combine results. >> >> It is the parallel ordered case that is tricky. Supposing we partition >> a stream into >> (a1,a2,a3), (a4,a5,a6) >> >> And suppose further we happen to be processing a5 when the bell goes >> off. Do we want to wait for all a_i, i<5, to finish before letting the >> computation quiesce? >> >> My gut says: for the things we intend to cancel, most of them will be >> order-insensitive anyway. Things like: >> >> - Find the best possible move after thinking for 5 seconds >> - Find the first solution that is better than X >> - Gather solutions until we have 100 of them >> >> I believe the key use case for cancelation here will be when we are >> chewing on potentially infinite streams of events (probably backed by >> IO) where we want to chew until we're asked to shut down, and want to >> get as much parallelism as we can cheaply. Which suggests to me the >> intersection between order-sensitive stream pipelines and cancelable >> stream pipelines is going to be pretty small indeed. >> >> Anyone want to add to this model of use cases for cancelation? > > From forax at univ-mlv.fr Sat Feb 9 15:24:39 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 10 Feb 2013 00:24:39 +0100 Subject: Internal and External truncation conditions In-Reply-To: References: Message-ID: <5116DAB7.70809@univ-mlv.fr> if forEachUntil takes a function that return a boolean, it's easy. try (BufferedReader r = Files.newBufferedReader(path, Charset.defaultCharset())) { return r.lines().parallel().forEachWhile(element -> { if (regex.matcher(line).matches()) { return false; } ...process the line return true; } } cheers, R?mi On 02/09/2013 08:55 PM, Sam Pullara wrote: > Let's say you only want to process lines until it matches a regex. > Here is one way you could try to implement it: > > AtomicBoolean done = new AtomicBoolean(); > Pattern regex = ...; > try (BufferedReader r = Files.newBufferedReader(path, > Charset.defaultCharset())) { > r.lines().forEachUntil( (line) -> { > if (!done.get()) { > if (regex.matcher(line).matches()) { > done.set(true); > } else { > ...process the line... > } > } > }, done::get); > } > > In the parallel case this completely breaks down since the lines can > be processed out of order. Gate would have to ensure that didn't > happen. > > Pattern regex = ...; > try (BufferedReader r = Files.newBufferedReader(path, > Charset.defaultCharset())) { > r.lines().gate(() -> !regex.matcher(line).matches()).forEach((line) > -> .. process line ..); > } > > If we wanted to cancel the operation asynchronously we would just call > Stream.close() and that should work with any stream. In the current > case we can't stop a stream from continuing to execute without > explicitly adding a forEachUntil() at the end of it with a condition > variable that we then change out of band. Also, since there may be > reduction operations in the middle, that may not even stop all of > those operations from completing. This can be especially bad for > things that should timeout. > > Sam > > On Sat, Feb 9, 2013 at 11:36 AM, Joe Bowbeer wrote: >> I haven't used either of these. If I wanted to create an example I'd >> probably start with a stream of lines() from a BufferedReader, and then tack >> on a use case. >> >> try (BufferedReader r = Files.newBufferedReader(path, >> Charset.defaultCharset())) { >> r.lines().forEachUntil(...); >> } >> >> >> Do you have something specific in mind? >> >> >> On Sat, Feb 9, 2013 at 11:26 AM, Sam Pullara wrote: >>> Now that we are further along, I wanted to bring this up again. I >>> don't think that forEachUntil is sufficient for handling internal and >>> external conditions that should truncate stream processing. I've also >>> looked at CloseableStream and that doesn't appear to help since it >>> isn't possible to wrap a Stream (say an infinite stream) with a >>> CloseableStream and get the necessary semantics of cancellation. Also, >>> other APIs that don't consider that you might give them a >>> CloseableStream will likely still give you back a Stream thus losing >>> the semantics. >>> >>> Is everyone else happy with forEachUntil and CloseableStream? >>> >>> Sam >>> >>> ---------- Forwarded message ---------- >>> From: Sam Pullara >>> Date: Mon, Dec 31, 2012 at 8:34 AM >>> Subject: Re: Cancelation -- use cases >>> To: Brian Goetz >>> Cc: "lambda-libs-spec-experts at openjdk.java.net" >>> >>> >>> I think we are conflating two things with this solution and it doesn't >>> work for them in my mind. Here is what I would like the solution to >>> cover: >>> >>> - External conditions (cancellation, cleanup) >>> - Internal conditions (gating based on count, elements and results) >>> >>> The first one may be the only one that works in the parallel case. It >>> should likely be implemented with .close() on stream that would stop >>> the stream as soon as possible. This would be useful for things like >>> timeouts. Kind of like calling close on an inputstream in the middle >>> of reading it. The other one I think is necessary and hard to >>> implement correctly with the parallel case. For instance I would like >>> to say: >>> >>> stream.gate(e -> e < 10).forEach(e -> ?) >>> >>> OR >>> >>> stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the >>> number of the current element >>> >>> That should give me every element in the stream until an element isn't >>> < 10 and then stop processing elements. Further, there should be some >>> way for the stream source to be notified that we are done consuming it >>> in case it is of unknown length or consumes resources. That would be >>> more like (assuming we add a Runnable call to Timer): >>> >>> Stream stream = ?. >>> new Timer().schedule(() -> stream.close(), 5000); >>> stream.forEach(e -> ?.); >>> >>> OR >>> >>> stream.forEach(e -> try { ? } catch() { stream.close() } ); >>> >>> Sadly, the first gate() case doesn't work well when parallelized. I'm >>> willing to just specify what the behavior is for that case to get it >>> into the API. For example, I would probably say something like "the >>> gate will need to return false once per split to stop processing". In >>> either of these cases I think one of the motivations needs to be that >>> the stream may be using resources and we need to tell the source that >>> we are done consuming it. For example, if the stream is sourced from a >>> file, database or even a large amount of memory there should be a >>> notification mechanism for doneness that will allow those resources to >>> be returned before the stream is exhausted. To that end I think that >>> Stream should implement AutoCloseable but overridden with no checked >>> exception. >>> >>> interface Stream implements AutoCloseable { >>> /** >>> * Closes this stream and releases any system resources associated >>> * with it. If the stream is already closed then invoking this >>> * method has no effect. Close is automatically called when the >>> * stream is exhausted. After this is called, no further elements >>> * will be processed by the stream but currently processing elements >>> * will complete normally. Calling other methods on a closed stream will >>> * produce IllegalStateExceptions. >>> */ >>> void close(); >>> >>> /** >>> * When the continueProcessing function returns false, no further >>> * elements will be processed after the gate. In the parallel stream >>> * case no further elements will be processed in the current split. >>> */ >>> Stream gate(Function until); >>> >>> /** >>> * As gate with the addition of the current element number. >>> */ >>> Stream gate(BiFunction until); >>> } >>> >>> This API avoids a lot of side effects that forEachUntil would require >>> implement these use cases. >>> >>> Sam >>> >>> On Dec 30, 2012, at 7:53 PM, Brian Goetz wrote: >>> >>> Here's a lower-complexity version of cancel, that still satisfies (in >>> series or in parallel) use cases like the following: >>> >>>> - Find the best possible move after thinking for 5 seconds >>>> - Find the first solution that is better than X >>>> - Gather solutions until we have 100 of them >>> without bringing in the complexity or time/space overhead of dealing >>> with encounter order. >>> >>> Since the forEach() operation works exclusively on the basis of >>> temporal/arrival order rather than spatial/encounter order (elements >>> are passed to the lambda in whatever order they are available, in >>> whatever thread they are available), we could make a canceling variant >>> of forEach: >>> >>> .forEachUntil(Block sink, BooleanSupplier until) >>> >>> Here, there is no confusion about what happens in the ordered case, no >>> need to buffer elements, etc. Elements flow into the block until the >>> termination condition transpires, at which point there are no more >>> splits and existing splits dispense no more elements. >>> >>> I implemented this (it was trivial) and wrote a simple test program to >>> calculate primes sequentially and in parallel, counting how many could >>> be calculated in a fixed amount of time, starting from an infinite >>> generator and filtering out composites: >>> >>> Streams.iterate(from, i -> i + 1) // sequential >>> .filter(i -> isPrime(i)) >>> .forEachUntil(i -> { >>> chm.put(i, true); >>> }, () -> System.currentTimeMillis() >= start+num); >>> >>> vs >>> >>> Streams.iterate(from, i -> i+1) // parallel >>> .parallel() >>> .filter(i -> isPrime(i)) >>> .forEachUntil(i -> { >>> chm.put(i, true); >>> }, () -> System.currentTimeMillis() >= start+num); >>> >>> On a 4-core Q6600 system, in a fixed amount of time, the parallel >>> version gathered ~3x as many primes. >>> >>> In terms of being able to perform useful computations on infinite >>> streams, this seems a pretty attractive price-performer; lower spec >>> and implementation complexity, and covers many of the use cases which >>> would otherwise be impractical to attack with the stream approach. >>> >>> >>> >>> On 12/28/2012 11:20 AM, Brian Goetz wrote: >>> >>> I've been working through some alternatives for cancellation support in >>> infinite streams. Looking to gather some use case background to help >>> evaluate the alternatives. >>> >>> In the serial case, the "gate" approach works fine -- after some >>> criteria transpires, stop sending elements downstream. The pipeline >>> flushes the elements it has, and completes early. >>> >>> In the parallel unordered case, the gate approach similarly works fine >>> -- after the cancelation criteria occurs, no new splits are created, and >>> existing splits dispense no more elements. The computation similarly >>> quiesces after elements currently being processed are completed, >>> possibly along with any up-tree merging to combine results. >>> >>> It is the parallel ordered case that is tricky. Supposing we partition >>> a stream into >>> (a1,a2,a3), (a4,a5,a6) >>> >>> And suppose further we happen to be processing a5 when the bell goes >>> off. Do we want to wait for all a_i, i<5, to finish before letting the >>> computation quiesce? >>> >>> My gut says: for the things we intend to cancel, most of them will be >>> order-insensitive anyway. Things like: >>> >>> - Find the best possible move after thinking for 5 seconds >>> - Find the first solution that is better than X >>> - Gather solutions until we have 100 of them >>> >>> I believe the key use case for cancelation here will be when we are >>> chewing on potentially infinite streams of events (probably backed by >>> IO) where we want to chew until we're asked to shut down, and want to >>> get as much parallelism as we can cheaply. Which suggests to me the >>> intersection between order-sensitive stream pipelines and cancelable >>> stream pipelines is going to be pretty small indeed. >>> >>> Anyone want to add to this model of use cases for cancelation? >> From sam at sampullara.com Sat Feb 9 15:49:12 2013 From: sam at sampullara.com (Sam Pullara) Date: Sat, 9 Feb 2013 15:49:12 -0800 Subject: Internal and External truncation conditions In-Reply-To: <5116DAB7.70809@univ-mlv.fr> References: <5116DAB7.70809@univ-mlv.fr> Message-ID: I think the point of forEachUntil is that Brian doesn't want to do this as it has issues when parallelized. I think that it is an important enough use case that we handle it anyway. This is better than forEachUtil though. Sam On Sat, Feb 9, 2013 at 3:24 PM, Remi Forax wrote: > if forEachUntil takes a function that return a boolean, it's easy. > > > try (BufferedReader r = Files.newBufferedReader(path, > Charset.defaultCharset())) { > return r.lines().parallel().forEachWhile(element -> { > if (regex.matcher(line).matches()) { > return false; > } > ...process the line > return true; > } > } > > cheers, > R?mi > > > On 02/09/2013 08:55 PM, Sam Pullara wrote: >> >> Let's say you only want to process lines until it matches a regex. >> Here is one way you could try to implement it: >> >> AtomicBoolean done = new AtomicBoolean(); >> Pattern regex = ...; >> try (BufferedReader r = Files.newBufferedReader(path, >> Charset.defaultCharset())) { >> r.lines().forEachUntil( (line) -> { >> if (!done.get()) { >> if (regex.matcher(line).matches()) { >> done.set(true); >> } else { >> ...process the line... >> } >> } >> }, done::get); >> } >> >> In the parallel case this completely breaks down since the lines can >> be processed out of order. Gate would have to ensure that didn't >> happen. >> >> Pattern regex = ...; >> try (BufferedReader r = Files.newBufferedReader(path, >> Charset.defaultCharset())) { >> r.lines().gate(() -> !regex.matcher(line).matches()).forEach((line) >> -> .. process line ..); >> } >> >> If we wanted to cancel the operation asynchronously we would just call >> Stream.close() and that should work with any stream. In the current >> case we can't stop a stream from continuing to execute without >> explicitly adding a forEachUntil() at the end of it with a condition >> variable that we then change out of band. Also, since there may be >> reduction operations in the middle, that may not even stop all of >> those operations from completing. This can be especially bad for >> things that should timeout. >> >> Sam >> >> On Sat, Feb 9, 2013 at 11:36 AM, Joe Bowbeer >> wrote: >>> >>> I haven't used either of these. If I wanted to create an example I'd >>> probably start with a stream of lines() from a BufferedReader, and then >>> tack >>> on a use case. >>> >>> try (BufferedReader r = Files.newBufferedReader(path, >>> Charset.defaultCharset())) { >>> r.lines().forEachUntil(...); >>> } >>> >>> >>> Do you have something specific in mind? >>> >>> >>> On Sat, Feb 9, 2013 at 11:26 AM, Sam Pullara wrote: >>>> >>>> Now that we are further along, I wanted to bring this up again. I >>>> don't think that forEachUntil is sufficient for handling internal and >>>> external conditions that should truncate stream processing. I've also >>>> looked at CloseableStream and that doesn't appear to help since it >>>> isn't possible to wrap a Stream (say an infinite stream) with a >>>> CloseableStream and get the necessary semantics of cancellation. Also, >>>> other APIs that don't consider that you might give them a >>>> CloseableStream will likely still give you back a Stream thus losing >>>> the semantics. >>>> >>>> Is everyone else happy with forEachUntil and CloseableStream? >>>> >>>> Sam >>>> >>>> ---------- Forwarded message ---------- >>>> From: Sam Pullara >>>> Date: Mon, Dec 31, 2012 at 8:34 AM >>>> Subject: Re: Cancelation -- use cases >>>> To: Brian Goetz >>>> Cc: "lambda-libs-spec-experts at openjdk.java.net" >>>> >>>> >>>> I think we are conflating two things with this solution and it doesn't >>>> work for them in my mind. Here is what I would like the solution to >>>> cover: >>>> >>>> - External conditions (cancellation, cleanup) >>>> - Internal conditions (gating based on count, elements and results) >>>> >>>> The first one may be the only one that works in the parallel case. It >>>> should likely be implemented with .close() on stream that would stop >>>> the stream as soon as possible. This would be useful for things like >>>> timeouts. Kind of like calling close on an inputstream in the middle >>>> of reading it. The other one I think is necessary and hard to >>>> implement correctly with the parallel case. For instance I would like >>>> to say: >>>> >>>> stream.gate(e -> e < 10).forEach(e -> ?) >>>> >>>> OR >>>> >>>> stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the >>>> number of the current element >>>> >>>> That should give me every element in the stream until an element isn't >>>> < 10 and then stop processing elements. Further, there should be some >>>> way for the stream source to be notified that we are done consuming it >>>> in case it is of unknown length or consumes resources. That would be >>>> more like (assuming we add a Runnable call to Timer): >>>> >>>> Stream stream = ?. >>>> new Timer().schedule(() -> stream.close(), 5000); >>>> stream.forEach(e -> ?.); >>>> >>>> OR >>>> >>>> stream.forEach(e -> try { ? } catch() { stream.close() } ); >>>> >>>> Sadly, the first gate() case doesn't work well when parallelized. I'm >>>> willing to just specify what the behavior is for that case to get it >>>> into the API. For example, I would probably say something like "the >>>> gate will need to return false once per split to stop processing". In >>>> either of these cases I think one of the motivations needs to be that >>>> the stream may be using resources and we need to tell the source that >>>> we are done consuming it. For example, if the stream is sourced from a >>>> file, database or even a large amount of memory there should be a >>>> notification mechanism for doneness that will allow those resources to >>>> be returned before the stream is exhausted. To that end I think that >>>> Stream should implement AutoCloseable but overridden with no checked >>>> exception. >>>> >>>> interface Stream implements AutoCloseable { >>>> /** >>>> * Closes this stream and releases any system resources associated >>>> * with it. If the stream is already closed then invoking this >>>> * method has no effect. Close is automatically called when the >>>> * stream is exhausted. After this is called, no further elements >>>> * will be processed by the stream but currently processing elements >>>> * will complete normally. Calling other methods on a closed stream >>>> will >>>> * produce IllegalStateExceptions. >>>> */ >>>> void close(); >>>> >>>> /** >>>> * When the continueProcessing function returns false, no further >>>> * elements will be processed after the gate. In the parallel stream >>>> * case no further elements will be processed in the current split. >>>> */ >>>> Stream gate(Function until); >>>> >>>> /** >>>> * As gate with the addition of the current element number. >>>> */ >>>> Stream gate(BiFunction until); >>>> } >>>> >>>> This API avoids a lot of side effects that forEachUntil would require >>>> implement these use cases. >>>> >>>> Sam >>>> >>>> On Dec 30, 2012, at 7:53 PM, Brian Goetz wrote: >>>> >>>> Here's a lower-complexity version of cancel, that still satisfies (in >>>> series or in parallel) use cases like the following: >>>> >>>>> - Find the best possible move after thinking for 5 seconds >>>>> - Find the first solution that is better than X >>>>> - Gather solutions until we have 100 of them >>>> >>>> without bringing in the complexity or time/space overhead of dealing >>>> with encounter order. >>>> >>>> Since the forEach() operation works exclusively on the basis of >>>> temporal/arrival order rather than spatial/encounter order (elements >>>> are passed to the lambda in whatever order they are available, in >>>> whatever thread they are available), we could make a canceling variant >>>> of forEach: >>>> >>>> .forEachUntil(Block sink, BooleanSupplier until) >>>> >>>> Here, there is no confusion about what happens in the ordered case, no >>>> need to buffer elements, etc. Elements flow into the block until the >>>> termination condition transpires, at which point there are no more >>>> splits and existing splits dispense no more elements. >>>> >>>> I implemented this (it was trivial) and wrote a simple test program to >>>> calculate primes sequentially and in parallel, counting how many could >>>> be calculated in a fixed amount of time, starting from an infinite >>>> generator and filtering out composites: >>>> >>>> Streams.iterate(from, i -> i + 1) // sequential >>>> .filter(i -> isPrime(i)) >>>> .forEachUntil(i -> { >>>> chm.put(i, true); >>>> }, () -> System.currentTimeMillis() >= start+num); >>>> >>>> vs >>>> >>>> Streams.iterate(from, i -> i+1) // parallel >>>> .parallel() >>>> .filter(i -> isPrime(i)) >>>> .forEachUntil(i -> { >>>> chm.put(i, true); >>>> }, () -> System.currentTimeMillis() >= start+num); >>>> >>>> On a 4-core Q6600 system, in a fixed amount of time, the parallel >>>> version gathered ~3x as many primes. >>>> >>>> In terms of being able to perform useful computations on infinite >>>> streams, this seems a pretty attractive price-performer; lower spec >>>> and implementation complexity, and covers many of the use cases which >>>> would otherwise be impractical to attack with the stream approach. >>>> >>>> >>>> >>>> On 12/28/2012 11:20 AM, Brian Goetz wrote: >>>> >>>> I've been working through some alternatives for cancellation support in >>>> infinite streams. Looking to gather some use case background to help >>>> evaluate the alternatives. >>>> >>>> In the serial case, the "gate" approach works fine -- after some >>>> criteria transpires, stop sending elements downstream. The pipeline >>>> flushes the elements it has, and completes early. >>>> >>>> In the parallel unordered case, the gate approach similarly works fine >>>> -- after the cancelation criteria occurs, no new splits are created, and >>>> existing splits dispense no more elements. The computation similarly >>>> quiesces after elements currently being processed are completed, >>>> possibly along with any up-tree merging to combine results. >>>> >>>> It is the parallel ordered case that is tricky. Supposing we partition >>>> a stream into >>>> (a1,a2,a3), (a4,a5,a6) >>>> >>>> And suppose further we happen to be processing a5 when the bell goes >>>> off. Do we want to wait for all a_i, i<5, to finish before letting the >>>> computation quiesce? >>>> >>>> My gut says: for the things we intend to cancel, most of them will be >>>> order-insensitive anyway. Things like: >>>> >>>> - Find the best possible move after thinking for 5 seconds >>>> - Find the first solution that is better than X >>>> - Gather solutions until we have 100 of them >>>> >>>> I believe the key use case for cancelation here will be when we are >>>> chewing on potentially infinite streams of events (probably backed by >>>> IO) where we want to chew until we're asked to shut down, and want to >>>> get as much parallelism as we can cheaply. Which suggests to me the >>>> intersection between order-sensitive stream pipelines and cancelable >>>> stream pipelines is going to be pretty small indeed. >>>> >>>> Anyone want to add to this model of use cases for cancelation? >>> >>> > From zhong.j.yu at gmail.com Sat Feb 9 20:25:12 2013 From: zhong.j.yu at gmail.com (Zhong Yu) Date: Sat, 9 Feb 2013 22:25:12 -0600 Subject: Internal and External truncation conditions In-Reply-To: References: Message-ID: Based on my own use cases, code that needs forEachUntil() usually intends to process just enough elements to produce a result, for example, a lexer scans a char stream until it yields a token. In that sense forEachUntil() is really an aggregator for *some* elements. We may have a method in the form of interface Stream R scan(Function scanner) The scanner is usually stateful. Elements are fed to the scanner, until it returns a non-null value; that value is the return value of scan(). If end of stream is reached before scanner returns non-null, scan() returns null. A scanner may need to react to EOF event, the application can design an EOF sentinel of type T. In the parallel case, scanner must be thread-safe; if it returns non-null for one split, it should return non-null for all splits at around the same time; one of the non-null values is chosen arbitrarily as the result of scan(). If null sentinel is too distasteful, scanner can return Optional; or it can yield result into a Consumer sink. Examples: Collection primes = ints.parallel().scan( gather primes till xxx ); Paragraph para = lines.scan( gather lines till an empty line or EOF ); scan() is only intended for part of the stream. To turn the whole stream into another stream, say a line stream into a paragraph stream, flatMap(FlatMapper) should work just fine. Zhong Yu On Sat, Feb 9, 2013 at 1:26 PM, Sam Pullara wrote: > Now that we are further along, I wanted to bring this up again. I > don't think that forEachUntil is sufficient for handling internal and > external conditions that should truncate stream processing. I've also > looked at CloseableStream and that doesn't appear to help since it > isn't possible to wrap a Stream (say an infinite stream) with a > CloseableStream and get the necessary semantics of cancellation. Also, > other APIs that don't consider that you might give them a > CloseableStream will likely still give you back a Stream thus losing > the semantics. > > Is everyone else happy with forEachUntil and CloseableStream? > > Sam > > ---------- Forwarded message ---------- > From: Sam Pullara > Date: Mon, Dec 31, 2012 at 8:34 AM > Subject: Re: Cancelation -- use cases > To: Brian Goetz > Cc: "lambda-libs-spec-experts at openjdk.java.net" > > > I think we are conflating two things with this solution and it doesn't > work for them in my mind. Here is what I would like the solution to > cover: > > - External conditions (cancellation, cleanup) > - Internal conditions (gating based on count, elements and results) > > The first one may be the only one that works in the parallel case. It > should likely be implemented with .close() on stream that would stop > the stream as soon as possible. This would be useful for things like > timeouts. Kind of like calling close on an inputstream in the middle > of reading it. The other one I think is necessary and hard to > implement correctly with the parallel case. For instance I would like > to say: > > stream.gate(e -> e < 10).forEach(e -> ?) > > OR > > stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the > number of the current element > > That should give me every element in the stream until an element isn't > < 10 and then stop processing elements. Further, there should be some > way for the stream source to be notified that we are done consuming it > in case it is of unknown length or consumes resources. That would be > more like (assuming we add a Runnable call to Timer): > > Stream stream = ?. > new Timer().schedule(() -> stream.close(), 5000); > stream.forEach(e -> ?.); > > OR > > stream.forEach(e -> try { ? } catch() { stream.close() } ); > > Sadly, the first gate() case doesn't work well when parallelized. I'm > willing to just specify what the behavior is for that case to get it > into the API. For example, I would probably say something like "the > gate will need to return false once per split to stop processing". In > either of these cases I think one of the motivations needs to be that > the stream may be using resources and we need to tell the source that > we are done consuming it. For example, if the stream is sourced from a > file, database or even a large amount of memory there should be a > notification mechanism for doneness that will allow those resources to > be returned before the stream is exhausted. To that end I think that > Stream should implement AutoCloseable but overridden with no checked > exception. > > interface Stream implements AutoCloseable { > /** > * Closes this stream and releases any system resources associated > * with it. If the stream is already closed then invoking this > * method has no effect. Close is automatically called when the > * stream is exhausted. After this is called, no further elements > * will be processed by the stream but currently processing elements > * will complete normally. Calling other methods on a closed stream will > * produce IllegalStateExceptions. > */ > void close(); > > /** > * When the continueProcessing function returns false, no further > * elements will be processed after the gate. In the parallel stream > * case no further elements will be processed in the current split. > */ > Stream gate(Function until); > > /** > * As gate with the addition of the current element number. > */ > Stream gate(BiFunction until); > } > > This API avoids a lot of side effects that forEachUntil would require > implement these use cases. > > Sam > > On Dec 30, 2012, at 7:53 PM, Brian Goetz wrote: > > Here's a lower-complexity version of cancel, that still satisfies (in > series or in parallel) use cases like the following: > >> - Find the best possible move after thinking for 5 seconds >> - Find the first solution that is better than X >> - Gather solutions until we have 100 of them > > without bringing in the complexity or time/space overhead of dealing > with encounter order. > > Since the forEach() operation works exclusively on the basis of > temporal/arrival order rather than spatial/encounter order (elements > are passed to the lambda in whatever order they are available, in > whatever thread they are available), we could make a canceling variant > of forEach: > > .forEachUntil(Block sink, BooleanSupplier until) > > Here, there is no confusion about what happens in the ordered case, no > need to buffer elements, etc. Elements flow into the block until the > termination condition transpires, at which point there are no more > splits and existing splits dispense no more elements. > > I implemented this (it was trivial) and wrote a simple test program to > calculate primes sequentially and in parallel, counting how many could > be calculated in a fixed amount of time, starting from an infinite > generator and filtering out composites: > > Streams.iterate(from, i -> i + 1) // sequential > .filter(i -> isPrime(i)) > .forEachUntil(i -> { > chm.put(i, true); > }, () -> System.currentTimeMillis() >= start+num); > > vs > > Streams.iterate(from, i -> i+1) // parallel > .parallel() > .filter(i -> isPrime(i)) > .forEachUntil(i -> { > chm.put(i, true); > }, () -> System.currentTimeMillis() >= start+num); > > On a 4-core Q6600 system, in a fixed amount of time, the parallel > version gathered ~3x as many primes. > > In terms of being able to perform useful computations on infinite > streams, this seems a pretty attractive price-performer; lower spec > and implementation complexity, and covers many of the use cases which > would otherwise be impractical to attack with the stream approach. > > > > On 12/28/2012 11:20 AM, Brian Goetz wrote: > > I've been working through some alternatives for cancellation support in > infinite streams. Looking to gather some use case background to help > evaluate the alternatives. > > In the serial case, the "gate" approach works fine -- after some > criteria transpires, stop sending elements downstream. The pipeline > flushes the elements it has, and completes early. > > In the parallel unordered case, the gate approach similarly works fine > -- after the cancelation criteria occurs, no new splits are created, and > existing splits dispense no more elements. The computation similarly > quiesces after elements currently being processed are completed, > possibly along with any up-tree merging to combine results. > > It is the parallel ordered case that is tricky. Supposing we partition > a stream into > (a1,a2,a3), (a4,a5,a6) > > And suppose further we happen to be processing a5 when the bell goes > off. Do we want to wait for all a_i, i<5, to finish before letting the > computation quiesce? > > My gut says: for the things we intend to cancel, most of them will be > order-insensitive anyway. Things like: > > - Find the best possible move after thinking for 5 seconds > - Find the first solution that is better than X > - Gather solutions until we have 100 of them > > I believe the key use case for cancelation here will be when we are > chewing on potentially infinite streams of events (probably backed by > IO) where we want to chew until we're asked to shut down, and want to > get as much parallelism as we can cheaply. Which suggests to me the > intersection between order-sensitive stream pipelines and cancelable > stream pipelines is going to be pretty small indeed. > > Anyone want to add to this model of use cases for cancelation? From zhong.j.yu at gmail.com Sat Feb 9 20:59:10 2013 From: zhong.j.yu at gmail.com (Zhong Yu) Date: Sat, 9 Feb 2013 22:59:10 -0600 Subject: Internal and External truncation conditions In-Reply-To: References: Message-ID: On Sat, Feb 9, 2013 at 10:25 PM, Zhong Yu wrote: > Based on my own use cases, code that needs forEachUntil() usually > intends to process just enough elements to produce a result, for > example, a lexer scans a char stream until it yields a token. In that > sense forEachUntil() is really an aggregator for *some* elements. We > may have a method in the form of > > interface Stream > > R scan(Function scanner) > > The scanner is usually stateful. Elements are fed to the scanner, > until it returns a non-null value; that value is the return value of > scan(). If end of stream is reached before scanner returns non-null, > scan() returns null. A scanner may need to react to EOF event, the > application can design an EOF sentinel of type T. > > In the parallel case, scanner must be thread-safe; if it returns > non-null for one split, it should return non-null for all splits at > around the same time; one of the non-null values is chosen arbitrarily > as the result of scan(). > > If null sentinel is too distasteful, scanner can return Optional; > or it can yield result into a Consumer sink. > > Examples: > > Collection primes = ints.parallel().scan( gather primes till xxx ); > > Paragraph para = lines.scan( gather lines till an empty line or EOF ); > > scan() is only intended for part of the stream. To turn the whole > stream into another stream, say a line stream into a paragraph stream, > flatMap(FlatMapper) should work just fine. Actually, scan() can be defined in term of flatMap(FlatMapper mapper).findFirst() the mapper is stateful; it gathers some elements then yields a result to the sink. The scan() method, though providing the same functionality, is more clear about the intention of the programmer. Zhong Yu From dl at cs.oswego.edu Sun Feb 10 05:12:18 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 10 Feb 2013 08:12:18 -0500 Subject: Internal and External truncation conditions In-Reply-To: <5116DAB7.70809@univ-mlv.fr> References: <5116DAB7.70809@univ-mlv.fr> Message-ID: <51179CB2.3000502@cs.oswego.edu> On 02/09/13 18:24, Remi Forax wrote: > if forEachUntil takes a function that return a boolean, it's easy. > > try (BufferedReader r = Files.newBufferedReader(path, Charset.defaultCharset())) { > return r.lines().parallel().forEachWhile(element -> { > if (regex.matcher(line).matches()) { > return false; > } > ...process the line > return true; > } > } > Which then becomes a variant of what I do in ConcurrentHashMap search{InParallel,Sequentially}, that applies to not only this but several other usage contexts: /** * Returns a non-null result from applying the given search * function on each (key, value), or null if none. Upon * success, further element processing is suppressed and the * results of any other parallel invocations of the search * function are ignored. * * @param searchFunction a function returning a non-null * result on success, else null * @return a non-null result from applying the given search * function on each (key, value), or null if none */ You'd use this here with a function that processed if a match (returning null) else returning the first non-match. Or rework in any of a couple of ways to similar effect. This works well in CHM because of its nullness policy. Which allows only this single method to serve as the basis for all possible short-circuit/cancel applications. It is so handy when nulls cannot be actual elements that it might be worth supporting instead of forEachUntil? People using it would need to ensure non-null elements. Just a thought. While I'm at it: Sam seems to be asking for asynchronous cancellation of bulk operations. I can't get myself to appreciate the utility of doing this. JDK/j.u.c supports several other ways (especially including the upcoming CompletableFutures) to carefully yet relatively conveniently arrange/manage cancellation, especially in IO-related contexts in which they most often arise. None of them explicitly address bulk computations (although any of them can do a bulk computation within a task). This is a feature, not a bug. If you are processing lots of elements, then only you know the responsiveness vs overhead tradeoffs of checking for async cancel status. Requiring that all Stream bulk computations like reduce continuously check for async cancel status between each per-element operation is unlikely to satisfy anyone at all, yet seems to be the only defensible option if we were to support it. -Doug From forax at univ-mlv.fr Sun Feb 10 05:46:08 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 10 Feb 2013 14:46:08 +0100 Subject: Internal and External truncation conditions In-Reply-To: <51179CB2.3000502@cs.oswego.edu> References: <5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu> Message-ID: <5117A4A0.5000208@univ-mlv.fr> On 02/10/2013 02:12 PM, Doug Lea wrote: > On 02/09/13 18:24, Remi Forax wrote: >> if forEachUntil takes a function that return a boolean, it's easy. >> >> try (BufferedReader r = Files.newBufferedReader(path, >> Charset.defaultCharset())) { >> return r.lines().parallel().forEachWhile(element -> { >> if (regex.matcher(line).matches()) { >> return false; >> } >> ...process the line >> return true; >> } >> } >> > > Which then becomes a variant of what I do in ConcurrentHashMap > search{InParallel,Sequentially}, that applies to not only this > but several other usage contexts: > > /** > * Returns a non-null result from applying the given search > * function on each (key, value), or null if none. Upon > * success, further element processing is suppressed and the > * results of any other parallel invocations of the search > * function are ignored. > * > * @param searchFunction a function returning a non-null > * result on success, else null > * @return a non-null result from applying the given search > * function on each (key, value), or null if none > */ > > You'd use this here with a function that processed if > a match (returning null) else returning the first non-match. > Or rework in any of a couple of ways to similar effect. > > This works well in CHM because of its nullness policy. > Which allows only this single method to serve as the basis > for all possible short-circuit/cancel applications. > It is so handy when nulls cannot be actual elements > that it might be worth supporting instead of forEachUntil? > People using it would need to ensure non-null elements. > Just a thought. yes, findFirst and forEachWhile/forEachUntil are the same operation from the implementation point of view if you have a value (not necessarily null) that says NO_VALUE. Now, I think it's an implementation detail and that from the user point of view we should provide them both. > > -Doug > > R?mi From forax at univ-mlv.fr Sun Feb 10 05:47:39 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 10 Feb 2013 14:47:39 +0100 Subject: Internal and External truncation conditions In-Reply-To: <5117A3D8.9040509@cs.oswego.edu> References: <5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu> <5117A3D8.9040509@cs.oswego.edu> Message-ID: <5117A4FB.7010609@univ-mlv.fr> On 02/10/2013 02:42 PM, Doug Lea wrote: > On 02/10/13 08:12, Doug Lea wrote: > >> Requiring that all Stream bulk computations like reduce >> continuously check for async cancel status between each >> per-element operation is unlikely to satisfy anyone at all, >> yet seems to be the only defensible option if we were to >> support it. >> > > Actually, we already support it. > Any per-element lambda supplied to any Stream method can > itself do any kind of async cancel check itself, and throw > an exception rather than returning a result. > > Case closed? No, throwing an exception when the VM thinks that can it can escape is really slow. > > -Doug > > > R?mi From dl at cs.oswego.edu Sun Feb 10 06:02:21 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 10 Feb 2013 09:02:21 -0500 Subject: Internal and External truncation conditions In-Reply-To: <5117A4FB.7010609@univ-mlv.fr> References: <5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu> <5117A3D8.9040509@cs.oswego.edu> <5117A4FB.7010609@univ-mlv.fr> Message-ID: <5117A86D.8090707@cs.oswego.edu> On 02/10/13 08:47, Remi Forax wrote: > On 02/10/2013 02:42 PM, Doug Lea wrote: >> Any per-element lambda supplied to any Stream method can >> itself do any kind of async cancel check itself, and throw >> an exception rather than returning a result. >> >> Case closed? > > No, throwing an exception when the VM thinks that can it can escape is really slow. > That's my point exactly! If you want to slow down bulk ops for the sake of responsiveness, then you should be aware of the tradeoffs. In practice, fine-grained cancel-checks are rarely worthwhile (you'd often finish 10 times faster, and thus usually not need to cancel, without the checks). But it should be the user's decision, not ours. Otherwise, we cannot internally arrange/support cancellation any faster than users can, but would penalize ALL users for the sake of those who need it. -Doug From forax at univ-mlv.fr Sun Feb 10 06:28:19 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 10 Feb 2013 15:28:19 +0100 Subject: Internal and External truncation conditions In-Reply-To: <5117A86D.8090707@cs.oswego.edu> References: <5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu> <5117A3D8.9040509@cs.oswego.edu> <5117A4FB.7010609@univ-mlv.fr> <5117A86D.8090707@cs.oswego.edu> Message-ID: <5117AE83.30206@univ-mlv.fr> On 02/10/2013 03:02 PM, Doug Lea wrote: > On 02/10/13 08:47, Remi Forax wrote: >> On 02/10/2013 02:42 PM, Doug Lea wrote: > >>> Any per-element lambda supplied to any Stream method can >>> itself do any kind of async cancel check itself, and throw >>> an exception rather than returning a result. >>> >>> Case closed? >> >> No, throwing an exception when the VM thinks that can it can escape >> is really slow. >> > > That's my point exactly! If you want to slow down bulk ops > for the sake of responsiveness, then you should be aware of > the tradeoffs. In practice, fine-grained cancel-checks > are rarely worthwhile (you'd often finish 10 times faster, > and thus usually not need to cancel, without the checks). > But it should be the user's decision, not ours. > Otherwise, we cannot internally arrange/support > cancellation any faster than users can, but would > penalize ALL users for the sake of those who need it. yes, you can it's exactly what j.l.i.SwitchPoint does. Note that we can't transform the whole pipeline to a method handle tree because we have no loopy method handle now, but if we have that, you can create a method handle tree corresponding to the pipeline and when the code will be JITed, with the new lambda form it will, the check will disappear and if a user calls cancl, the JITed code will be trashed and the execution will go back into the interpreter that will do the check. > > -Doug > > R?mi From dl at cs.oswego.edu Sun Feb 10 06:42:56 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 10 Feb 2013 09:42:56 -0500 Subject: Internal and External truncation conditions In-Reply-To: <5117AE83.30206@univ-mlv.fr> References: <5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu> <5117A3D8.9040509@cs.oswego.edu> <5117A4FB.7010609@univ-mlv.fr> <5117A86D.8090707@cs.oswego.edu> <5117AE83.30206@univ-mlv.fr> Message-ID: <5117B1F0.6060806@cs.oswego.edu> On 02/10/13 09:28, Remi Forax wrote: > yes, you can it's exactly what j.l.i.SwitchPoint does. > Note that we can't transform the whole pipeline to a method handle tree because > we have no loopy method handle now, but if we have that, you can create a method > handle tree corresponding to the pipeline and when the code will be JITed, with > the new lambda form it will, the check will disappear and if a user calls cancl, > the JITed code will be trashed and the execution will go back into the > interpreter that will do the check. > Which amounts to, at best, an approximation of the rare-trap mechanics that would be used for explicit check in user code if the handles are fully resolved? -Doug From dl at cs.oswego.edu Sun Feb 10 05:42:48 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 10 Feb 2013 08:42:48 -0500 Subject: Internal and External truncation conditions In-Reply-To: <51179CB2.3000502@cs.oswego.edu> References: <5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu> Message-ID: <5117A3D8.9040509@cs.oswego.edu> On 02/10/13 08:12, Doug Lea wrote: > Requiring that all Stream bulk computations like reduce > continuously check for async cancel status between each > per-element operation is unlikely to satisfy anyone at all, > yet seems to be the only defensible option if we were to > support it. > Actually, we already support it. Any per-element lambda supplied to any Stream method can itself do any kind of async cancel check itself, and throw an exception rather than returning a result. Case closed? -Doug From zhong.j.yu at gmail.com Sun Feb 10 08:30:33 2013 From: zhong.j.yu at gmail.com (Zhong Yu) Date: Sun, 10 Feb 2013 10:30:33 -0600 Subject: Internal and External truncation conditions In-Reply-To: <51179CB2.3000502@cs.oswego.edu> References: <5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu> Message-ID: On Sun, Feb 10, 2013 at 7:12 AM, Doug Lea
wrote: > On 02/09/13 18:24, Remi Forax wrote: >> >> if forEachUntil takes a function that return a boolean, it's easy. >> >> try (BufferedReader r = Files.newBufferedReader(path, >> Charset.defaultCharset())) { >> return r.lines().parallel().forEachWhile(element -> { >> if (regex.matcher(line).matches()) { >> return false; >> } >> ...process the line >> return true; >> } >> } >> > > Which then becomes a variant of what I do in ConcurrentHashMap > search{InParallel,Sequentially}, that applies to not only this > but several other usage contexts: > > /** > * Returns a non-null result from applying the given search > * function on each (key, value), or null if none. Upon > * success, further element processing is suppressed and the > * results of any other parallel invocations of the search > * function are ignored. > * > * @param searchFunction a function returning a non-null > * result on success, else null > * @return a non-null result from applying the given search > * function on each (key, value), or null if none > */ > > You'd use this here with a function that processed if > a match (returning null) else returning the first non-match. > Or rework in any of a couple of ways to similar effect. > > This works well in CHM because of its nullness policy. > Which allows only this single method to serve as the basis > for all possible short-circuit/cancel applications. > It is so handy when nulls cannot be actual elements > that it might be worth supporting instead of forEachUntil? > People using it would need to ensure non-null elements. > Just a thought. null is fine if we use Optional Optional search(Function) > > While I'm at it: > > Sam seems to be asking for asynchronous cancellation of bulk > operations. I can't get myself to appreciate the utility of > doing this. JDK/j.u.c supports several other ways (especially > including the upcoming CompletableFutures) to carefully yet > relatively conveniently arrange/manage cancellation, especially > in IO-related contexts in which they most often arise. None > of them explicitly address bulk computations (although any > of them can do a bulk computation within a task). This is > a feature, not a bug. If you are processing lots > of elements, then only you know the responsiveness vs > overhead tradeoffs of checking for async cancel status. > > Requiring that all Stream bulk computations like reduce > continuously check for async cancel status between each > per-element operation is unlikely to satisfy anyone at all, > yet seems to be the only defensible option if we were to > support it. > > -Doug > > From forax at univ-mlv.fr Sun Feb 10 09:25:20 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 10 Feb 2013 18:25:20 +0100 Subject: Spliterator.tryAdvance Message-ID: <5117D800.9090708@univ-mlv.fr> Playing a little bit with how findFirst/forEachUntil can be implemented on top of a Spliterator, I think that tryAdvance should be changed to be able to return a value produced in the middle of the consumer taken by tryAdvance. I would like to have tryAdvance to be like this: /** * Sentinel value used by tryAdvance to signal that there is no more element. */ public static final Object END = new Object(); /** * If a remaining element exists, performs the given action on it, * returning the result of the function, otherwise returns {@code END}. * * @param action The action to perform. * @return {@code END} if no remaining elements existed * upon entry to this method, else the return value of the action. */ Object tryAdvance(Function action); and forEach is a little bit uglier: Function action = element -> { consumer.accept(element); return null; } do {} while(tryAdvance(action) != END); with that, there is no need to use a side value to express the fact that we have already found the resulting value because we can return it has the return value of tryAdvance. cheers, R?mi From forax at univ-mlv.fr Sun Feb 10 09:33:24 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 10 Feb 2013 18:33:24 +0100 Subject: Internal and External truncation conditions In-Reply-To: <5117B1F0.6060806@cs.oswego.edu> References: <5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu> <5117A3D8.9040509@cs.oswego.edu> <5117A4FB.7010609@univ-mlv.fr> <5117A86D.8090707@cs.oswego.edu> <5117AE83.30206@univ-mlv.fr> <5117B1F0.6060806@cs.oswego.edu> Message-ID: <5117D9E4.6040008@univ-mlv.fr> On 02/10/2013 03:42 PM, Doug Lea wrote: > On 02/10/13 09:28, Remi Forax wrote: > >> yes, you can it's exactly what j.l.i.SwitchPoint does. >> Note that we can't transform the whole pipeline to a method handle >> tree because >> we have no loopy method handle now, but if we have that, you can >> create a method >> handle tree corresponding to the pipeline and when the code will be >> JITed, with >> the new lambda form it will, the check will disappear and if a user >> calls cancl, >> the JITed code will be trashed and the execution will go back into the >> interpreter that will do the check. >> > > Which amounts to, at best, an approximation of the rare-trap mechanics > that would be used for explicit check in user code if the handles > are fully resolved? It depends what handles fully resolved mean and how the loopy method handle is implemented. If it's implemented like OSR, there is a check when interpreting the code just before doing the backward jump. When JITed, there is no supplementary cost because the JIT has to insert a GC safepoint check (a read to a well known page) and the cancellation mechanism can re-use the very same read. > > -Doug > > R?mi From dl at cs.oswego.edu Sun Feb 10 13:00:27 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 10 Feb 2013 16:00:27 -0500 Subject: Spliterator.tryAdvance In-Reply-To: <5117D800.9090708@univ-mlv.fr> References: <5117D800.9090708@univ-mlv.fr> Message-ID: <51180A6B.4020503@cs.oswego.edu> On 02/10/13 12:25, Remi Forax wrote: > Playing a little bit with how findFirst/forEachUntil can be implemented on top > of a Spliterator, > I think that tryAdvance should be changed to be able to return a value produced > in the middle of the consumer taken by tryAdvance. Brian and I spent a while on this theme, of only supporting forEach and some variant of the search method I mentioned. If we had nonnull-element guarantees, it would be an easier call: just use CHM-like search. The primitive int/long/double versions would need a boxed return value but these could sometimes be optimized away in practice. All in all seems pretty good. But when you also allow nullable elements, it means that every call is guaranteed to create a nuisance object, which makes it less attractive than single-step tryAdvance as the basic workhorse underlying a lot of bulk computations. > /** > * Sentinel value used by tryAdvance to signal that there is no more element. > */ > public static final Object END = new Object(); No can do. (Primitives.) -Doug From forax at univ-mlv.fr Sun Feb 10 14:19:31 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 10 Feb 2013 23:19:31 +0100 Subject: Spliterator.tryAdvance In-Reply-To: <51180A6B.4020503@cs.oswego.edu> References: <5117D800.9090708@univ-mlv.fr> <51180A6B.4020503@cs.oswego.edu> Message-ID: <51181CF3.9000708@univ-mlv.fr> On 02/10/2013 10:00 PM, Doug Lea wrote: > On 02/10/13 12:25, Remi Forax wrote: >> Playing a little bit with how findFirst/forEachUntil can be >> implemented on top >> of a Spliterator, >> I think that tryAdvance should be changed to be able to return a >> value produced >> in the middle of the consumer taken by tryAdvance. > > Brian and I spent a while on this theme, of only supporting > forEach and some variant of the search method I mentioned. > If we had nonnull-element guarantees, it would be an easier call: > just use CHM-like search. The primitive int/long/double > versions would need a boxed return value but these could > sometimes be optimized away in practice. All in all seems > pretty good. But when you also allow nullable elements, > it means that every call is guaranteed to create a nuisance > object, which makes it less attractive than single-step > tryAdvance as the basic workhorse underlying a lot of bulk > computations. You get it wrong, I think. What I propose is that tryAdvance is not a search like operation but a single-step operation. And when it calls the action at the end, the action is able to back-propagate a resulting value. You can see it has a way to abstract the way a read on an input works, either you get the the number of bytes read or you get -1. Here, with tryAdvance, either you get the return value of the action or you get END, you still have to call tryAdvance several times to consume the whole stream. About null, given that the return value can only comes from the consumer, the return value can be null or not depending on what the user has specified as action. > >> /** >> * Sentinel value used by tryAdvance to signal that there is no >> more element. >> */ >> public static final Object END = new Object(); > > No can do. (Primitives.) If the primitive value is one which is used in a reduce, yes, it's true, it can not do that, but anyway, you can not send the reduced value to the action too, or you need to create a new action at each call. Otherwise, the stream API uses Optional as box, so the stream API already requires the implementation to box the value. > > -Doug > > R?mi From forax at univ-mlv.fr Mon Feb 11 07:34:56 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 11 Feb 2013 16:34:56 +0100 Subject: Spliterator.tryAdvance Message-ID: <51190FA0.3020601@univ-mlv.fr> There is another point, the specification should be relatex to allow tryAdvance to not always call the consumer taken as parameter. If by example, I want to implements a Spliterator that filter the elements, this implementation should be legal: class FilterSpliterator implements Spliterator { private final Spliterator spliterator; private final Predicate predicate; public FilterSpliterator(Spliterator spliterator, Predicate predicate) { .... } public void tryAdvance(Consumer consumer) { spliterator.tryAdvance(element -> { if (predicate.test(element)) { consumer.accept(element); } }); } } otherwise, you have to use a while loop around spliterator.tryAdvance but because there is no way to transmit the information that the element is accepted or not (see my previous mail), you can not use a lambda here and you have to rely on an inner class. cheers, R?mi From brian.goetz at oracle.com Mon Feb 11 08:41:33 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 11 Feb 2013 11:41:33 -0500 Subject: Reducing reduce Message-ID: <51191F3D.4090203@oracle.com> Now that we've added all the shapes of map() to Stream (map to ref/int/long/double), and we've separated functional reduce (currently called reduce) from mutable reduce (currently called collect), I think that leaves room for taking out one of the reduce methods from Stream: U reduce(U identity, BiFunction accumulator, BinaryOperator reducer); This is the one that confuses everyone anyway, and I don't think we need it any more. The argument for having this form instead of discrete map+reduce are: - fused map+reduce reduces boxing - this three-arg form can also fold filtering into the accumulation However, since we now have primitive-bearing map methods, and we can do filtering before and after the map, is this form really carrying its weight? Specifically because people find it counterintuitive, we should consider dropping it and guiding people towards map+reduce. For example, "sum of pages" over a stream of Documents is better written as: docs.map(Document::getPageCount).sum() rather than docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum) The big place where we need three-arg reduce is when we're folding into a mutable store. But that's now handled by collect(). Have I missed any use cases that would justify keeping this form? From kevinb at google.com Mon Feb 11 08:55:06 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 11 Feb 2013 08:55:06 -0800 Subject: Reducing reduce In-Reply-To: <51191F3D.4090203@oracle.com> References: <51191F3D.4090203@oracle.com> Message-ID: +1, please drop it. On Mon, Feb 11, 2013 at 8:41 AM, Brian Goetz wrote: > Now that we've added all the shapes of map() to Stream (map to > ref/int/long/double), and we've separated functional reduce (currently > called reduce) from mutable reduce (currently called collect), I think that > leaves room for taking out one of the reduce methods from Stream: > > U reduce(U identity, > BiFunction accumulator, > BinaryOperator reducer); > > This is the one that confuses everyone anyway, and I don't think we need > it any more. > > The argument for having this form instead of discrete map+reduce are: > - fused map+reduce reduces boxing > - this three-arg form can also fold filtering into the accumulation > > However, since we now have primitive-bearing map methods, and we can do > filtering before and after the map, is this form really carrying its > weight? Specifically because people find it counterintuitive, we should > consider dropping it and guiding people towards map+reduce. > > For example, "sum of pages" over a stream of Documents is better written > as: > > docs.map(Document::**getPageCount).sum() > > rather than > > docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum) > > The big place where we need three-arg reduce is when we're folding into a > mutable store. But that's now handled by collect(). > > Have I missed any use cases that would justify keeping this form? > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From joe.bowbeer at gmail.com Mon Feb 11 08:57:41 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 11 Feb 2013 08:57:41 -0800 Subject: Reducing reduce In-Reply-To: <51191F3D.4090203@oracle.com> References: <51191F3D.4090203@oracle.com> Message-ID: My parallel string-compare sample provides two implementations (below). Will both of these survive your changes? bitbucket.org/joebowbeer/stringcompare int compareMapReduce(String s1, String s2) { assert s1.length() == s2.length(); return intRange(0, s1.length()).parallel() .map(i -> compare(s1.charAt(i), s2.charAt(i))) .reduce(0, (l, r) -> (l != 0) ? l : r); } int compareBoxedReduce(String s1, String s2) { assert s1.length() == s2.length(); return intRange(0, s1.length()).parallel().boxed() .reduce(0, (l, i) -> (l != 0) ? l : compare(s1.charAt(i), s2.charAt(i)), (l, r) -> (l != 0) ? l : r); } The person who sold me the second form told me it would "burn less heat". He said that I could optimize my map/reduce by having it "not even calculate *f* if the left operand is nonzero, by combining the map and reduce steps into a fold." What is that person going to sell me now? Joe On Mon, Feb 11, 2013 at 8:41 AM, Brian Goetz wrote: > Now that we've added all the shapes of map() to Stream (map to > ref/int/long/double), and we've separated functional reduce (currently > called reduce) from mutable reduce (currently called collect), I think that > leaves room for taking out one of the reduce methods from Stream: > > U reduce(U identity, > BiFunction accumulator, > BinaryOperator reducer); > > This is the one that confuses everyone anyway, and I don't think we need > it any more. > > The argument for having this form instead of discrete map+reduce are: > - fused map+reduce reduces boxing > - this three-arg form can also fold filtering into the accumulation > > However, since we now have primitive-bearing map methods, and we can do > filtering before and after the map, is this form really carrying its > weight? Specifically because people find it counterintuitive, we should > consider dropping it and guiding people towards map+reduce. > > For example, "sum of pages" over a stream of Documents is better written > as: > > docs.map(Document::**getPageCount).sum() > > rather than > > docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum) > > The big place where we need three-arg reduce is when we're folding into a > mutable store. But that's now handled by collect(). > > Have I missed any use cases that would justify keeping this form? > > From brian.goetz at oracle.com Mon Feb 11 09:12:01 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 11 Feb 2013 12:12:01 -0500 Subject: Reducing reduce In-Reply-To: References: <51191F3D.4090203@oracle.com> Message-ID: <51192661.2010501@oracle.com> Thanks, Joe. I knew I was missing some use cases. This is definitely a case where the fused version is more efficient, since it can elide some work based on the previous comparison state. On 2/11/2013 11:57 AM, Joe Bowbeer wrote: > My parallel string-compare sample provides two implementations (below). > > Will both of these survive your changes? > > bitbucket.org/joebowbeer/stringcompare > > > int compareMapReduce(String s1, String s2) { > assert s1.length() == s2.length(); > return intRange(0, s1.length()).parallel() > .map(i -> compare(s1.charAt(i), s2.charAt(i))) > .reduce(0, (l, r) -> (l != 0) ? l : r); > } > > int compareBoxedReduce(String s1, String s2) { > assert s1.length() == s2.length(); > return intRange(0, s1.length()).parallel().boxed() > .reduce(0, (l, i) -> (l != 0) ? l : compare(s1.charAt(i), s2.charAt(i)), > (l, r) -> (l != 0) ? l : r); > } > > > > > The person who sold me the second form told me it would "burn less > heat". He said that I could optimize my map/reduce by having it "not > even calculate *f* if the left operand is nonzero, by combining the map > and reduce steps into a fold." > > What is that person going to sell me now? > > Joe > > > On Mon, Feb 11, 2013 at 8:41 AM, Brian Goetz > wrote: > > Now that we've added all the shapes of map() to Stream (map to > ref/int/long/double), and we've separated functional reduce > (currently called reduce) from mutable reduce (currently called > collect), I think that leaves room for taking out one of the reduce > methods from Stream: > > U reduce(U identity, > BiFunction accumulator, > BinaryOperator reducer); > > This is the one that confuses everyone anyway, and I don't think we > need it any more. > > The argument for having this form instead of discrete map+reduce are: > - fused map+reduce reduces boxing > - this three-arg form can also fold filtering into the accumulation > > However, since we now have primitive-bearing map methods, and we can > do filtering before and after the map, is this form really carrying > its weight? Specifically because people find it counterintuitive, > we should consider dropping it and guiding people towards map+reduce. > > For example, "sum of pages" over a stream of Documents is better > written as: > > docs.map(Document::__getPageCount).sum() > > rather than > > docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum) > > The big place where we need three-arg reduce is when we're folding > into a mutable store. But that's now handled by collect(). > > Have I missed any use cases that would justify keeping this form? > > From paul.sandoz at oracle.com Mon Feb 11 09:43:09 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 11 Feb 2013 18:43:09 +0100 Subject: Reducing reduce In-Reply-To: <51192661.2010501@oracle.com> References: <51191F3D.4090203@oracle.com> <51192661.2010501@oracle.com> Message-ID: <26AD1E48-38C6-427B-AE75-8B83E440D95D@oracle.com> On Feb 11, 2013, at 6:12 PM, Brian Goetz wrote: > Thanks, Joe. I knew I was missing some use cases. This is definitely a case where the fused version is more efficient, since it can elide some work based on the previous comparison state. > And efficiency wise it would be nice to avoid the boxed(). Paul. > On 2/11/2013 11:57 AM, Joe Bowbeer wrote: >> My parallel string-compare sample provides two implementations (below). >> >> Will both of these survive your changes? >> >> bitbucket.org/joebowbeer/stringcompare >> >> >> int compareMapReduce(String s1, String s2) { >> assert s1.length() == s2.length(); >> return intRange(0, s1.length()).parallel() >> .map(i -> compare(s1.charAt(i), s2.charAt(i))) >> .reduce(0, (l, r) -> (l != 0) ? l : r); >> } >> >> int compareBoxedReduce(String s1, String s2) { >> assert s1.length() == s2.length(); >> return intRange(0, s1.length()).parallel().boxed() >> .reduce(0, (l, i) -> (l != 0) ? l : compare(s1.charAt(i), s2.charAt(i)), >> (l, r) -> (l != 0) ? l : r); >> } >> From dl at cs.oswego.edu Tue Feb 12 04:52:23 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 12 Feb 2013 07:52:23 -0500 Subject: Spliterator.tryAdvance In-Reply-To: <51190FA0.3020601@univ-mlv.fr> References: <51190FA0.3020601@univ-mlv.fr> Message-ID: <511A3B07.1060906@cs.oswego.edu> On 02/11/13 10:34, Remi Forax wrote: > There is another point, > the specification should be relatex to allow tryAdvance to not always call the > consumer taken as parameter. These are all the same issue in disguise (including the one you mentioned that I didn't get :-) The question is: Can you design Spliterators and/or related leaf-computation-level support such that none of the "basic" Stream ops require use of a lambda / inner class that needs a mutable variable? I took this path in ConcurrentHashMap (see http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ConcurrentHashMap.html), resulting in 4 "basic" methods (plus 3 more for primitives). I think it is the right solution for CHM, but it cannot apply to Streams (CHM can rely on nullness, and imposes requirement that users pre-fuse multiple map and map-reduce ops, etc.) And if you explore what it would take to do this for the Stream API, it gets quickly out of hand -- at least a dozen or so operations that every Collection, Map, or other Stream/Spliterator source author would have to write. Which led to the present solution of only requiring forEach, trySplit, and tryAdvance. -Doug > > If by example, I want to implements a Spliterator that filter the elements, > this implementation should be legal: > > class FilterSpliterator implements Spliterator { > private final Spliterator spliterator; > private final Predicate predicate; > > public FilterSpliterator(Spliterator spliterator, Predicate predicate) { > .... > } > > public void tryAdvance(Consumer consumer) { > spliterator.tryAdvance(element -> { > if (predicate.test(element)) { > consumer.accept(element); > } > }); > } > } > > otherwise, you have to use a while loop around spliterator.tryAdvance but > because there is no way to transmit the information that the element is > accepted or not > (see my previous mail), you can not use a lambda here and you have to rely on an > inner class. > > cheers, > R?mi > From forax at univ-mlv.fr Tue Feb 12 08:04:43 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 12 Feb 2013 17:04:43 +0100 Subject: Spliterator.tryAdvance In-Reply-To: <511A3B07.1060906@cs.oswego.edu> References: <51190FA0.3020601@univ-mlv.fr> <511A3B07.1060906@cs.oswego.edu> Message-ID: <511A681B.2090104@univ-mlv.fr> On 02/12/2013 01:52 PM, Doug Lea wrote: > On 02/11/13 10:34, Remi Forax wrote: >> There is another point, >> the specification should be relatex to allow tryAdvance to not always >> call the >> consumer taken as parameter. > > These are all the same issue in disguise (including the one > you mentioned that I didn't get :-) CHM.search is different from the proposed Spliterator.tryAdvance that returns a value because tryAdvance never consumes more than one element, just one in gfact. With that, you can use a well known value to say, I've done nothing and doesn't need to rely on "null" means nothing. > > The question is: Can you design Spliterators and/or > related leaf-computation-level support such that none > of the "basic" Stream ops require use of a lambda / inner class > that needs a mutable variable? yes ! > > I took this path in ConcurrentHashMap > (see > http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ConcurrentHashMap.html), > resulting in 4 "basic" methods > (plus 3 more for primitives). I think it is the right solution > for CHM, but it cannot apply to Streams (CHM can rely on > nullness, and imposes requirement that users pre-fuse multiple > map and map-reduce ops, etc.) > > And if you explore what it would take to do this for the > Stream API, it gets quickly out of hand -- at least > a dozen or so operations that every Collection, Map, or > other Stream/Spliterator source author would have to write. > Which led to the present solution of only requiring > forEach, trySplit, and tryAdvance. forEach, tryAvance that rely on side effect is not that bad for leaf of a forkjoin because by design forkjoin put variables into fields. But for a serial stream, forcing values to be stored in fields instead of on stack or in register is really a bad idea from a perf point of view. pre-fuse operations try to tackle another problem, the fact that calling the lambda are megamorphic. This can be solved in the stream by having dedicated path or generating one code for the whole pipeline. Here, we are talking about the spliterator interface, no other interface, IMO, the spliterator interface should have 3 operations: void forEach(Consumer), Object tryAdvance(Object, Function) that take an element and try to call function on it and U reduce(U, Function) (and reduceInt/reduceLong/reduceDouble). /** * Sentinel value used by tryAdvance to signal that there is no more element. */ public static final Object END = new Object(); /** * If no remaining element exists, tryAdvance returns {@code END}. * If a remaining element exists, tryAdvance will try to performs the given action on it, * if the remaining element is filtered out, then the value noValue taken as parameter is returned * else the action is called with the remaining element. * * @param noValue a value returned is the element is filtered out * @param action the action to perform. * @return {@code END} if no remaining elements existed * upon entry to this method, else the return value of the action. */ Object tryAdvance(Object noValue, Function action); so yes, there are more method to implement but you can use lambdas fof most of the basic operations instead of using inner classes. So I'm not sure it's more cumbersome. > > -Doug R?mi > >> >> If by example, I want to implements a Spliterator that filter the >> elements, >> this implementation should be legal: >> >> class FilterSpliterator implements Spliterator { >> private final Spliterator spliterator; >> private final Predicate predicate; >> >> public FilterSpliterator(Spliterator spliterator, Predicate >> predicate) { >> .... >> } >> >> public void tryAdvance(Consumer consumer) { >> spliterator.tryAdvance(element -> { >> if (predicate.test(element)) { >> consumer.accept(element); >> } >> }); >> } >> } >> >> otherwise, you have to use a while loop around spliterator.tryAdvance >> but >> because there is no way to transmit the information that the element is >> accepted or not >> (see my previous mail), you can not use a lambda here and you have to >> rely on an >> inner class. >> >> cheers, >> R?mi >> > > From brian.goetz at oracle.com Tue Feb 12 10:41:51 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 12 Feb 2013 13:41:51 -0500 Subject: FlatMapper Message-ID: <511A8CEF.8070800@oracle.com> Here's where things have currently landed with FlatMapper -- this is a type in java.util.stream, with nested specializations. Full bikeshed season is now open. Are we OK with the name explodeInto()? Is this general enough to join the ranks of Function and Supplier as top-level types in java.util.function? @FunctionalInterface public interface FlatMapper { void explodeInto(T element, Consumer sink); @FunctionalInterface interface ToInt { void explodeInto(T element, IntConsumer sink); } @FunctionalInterface interface ToLong { void explodeInto(T element, LongConsumer sink); } @FunctionalInterface interface ToDouble { void explodeInto(T element, DoubleConsumer sink); } @FunctionalInterface interface OfIntToInt { void explodeInto(int element, IntConsumer sink); } @FunctionalInterface interface OfLongToLong { void explodeInto(long element, LongConsumer sink); } @FunctionalInterface interface OfDoubleToDouble { void explodeInto(double element, DoubleConsumer sink); } } From joe.bowbeer at gmail.com Tue Feb 12 10:49:09 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Tue, 12 Feb 2013 10:49:09 -0800 Subject: FlatMapper In-Reply-To: <511A8CEF.8070800@oracle.com> References: <511A8CEF.8070800@oracle.com> Message-ID: A verb that had some relation to "flat" would be nice - instead of explode, which doesn't. flatten extrude ?? On Feb 12, 2013 10:42 AM, "Brian Goetz" wrote: > Here's where things have currently landed with FlatMapper -- this is a > type in java.util.stream, with nested specializations. > > Full bikeshed season is now open. Are we OK with the name explodeInto()? > Is this general enough to join the ranks of Function and Supplier as > top-level types in java.util.function? > > @FunctionalInterface > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > @FunctionalInterface > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > @FunctionalInterface > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > @FunctionalInterface > interface ToDouble { > void explodeInto(T element, DoubleConsumer sink); > } > > @FunctionalInterface > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > @FunctionalInterface > interface OfLongToLong { > void explodeInto(long element, LongConsumer sink); > } > > @FunctionalInterface > interface OfDoubleToDouble { > void explodeInto(double element, DoubleConsumer sink); > } > } > From Donald.Raab at gs.com Tue Feb 12 10:52:54 2013 From: Donald.Raab at gs.com (Raab, Donald) Date: Tue, 12 Feb 2013 13:52:54 -0500 Subject: FlatMapper In-Reply-To: <511A8CEF.8070800@oracle.com> References: <511A8CEF.8070800@oracle.com> Message-ID: <6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com> Are we going to have a consistency issue with FlatMapper vs. Function? For instance we have ToIntFunction, but not ToIntFlatMapper. Instead we have FlatMapper.ToInt. > -----Original Message----- > From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda- > libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Brian Goetz > Sent: Tuesday, February 12, 2013 1:42 PM > To: lambda-libs-spec-experts at openjdk.java.net > Subject: FlatMapper > > Here's where things have currently landed with FlatMapper -- this is a type > in java.util.stream, with nested specializations. > > Full bikeshed season is now open. Are we OK with the name explodeInto()? > Is this general enough to join the ranks of Function and Supplier as top- > level types in java.util.function? > > @FunctionalInterface > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > @FunctionalInterface > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > @FunctionalInterface > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > @FunctionalInterface > interface ToDouble { > void explodeInto(T element, DoubleConsumer sink); > } > > @FunctionalInterface > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > @FunctionalInterface > interface OfLongToLong { > void explodeInto(long element, LongConsumer sink); > } > > @FunctionalInterface > interface OfDoubleToDouble { > void explodeInto(double element, DoubleConsumer sink); > } > } From brian.goetz at oracle.com Tue Feb 12 10:57:56 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 12 Feb 2013 13:57:56 -0500 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> Message-ID: <511A90B4.8030609@oracle.com> Since the name doesn't appear in implementations often, we can use a more descriptive name, even if it is long, such as mapAndFlattenInto Would that be better? On 2/12/2013 1:49 PM, Joe Bowbeer wrote: > A verb that had some relation to "flat" would be nice - instead of > explode, which doesn't. > > flatten > extrude > ?? > > On Feb 12, 2013 10:42 AM, "Brian Goetz" > wrote: > > Here's where things have currently landed with FlatMapper -- this is > a type in java.util.stream, with nested specializations. > > Full bikeshed season is now open. Are we OK with the name > explodeInto()? Is this general enough to join the ranks of Function > and Supplier as top-level types in java.util.function? > > @FunctionalInterface > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > @FunctionalInterface > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > @FunctionalInterface > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > @FunctionalInterface > interface ToDouble { > void explodeInto(T element, DoubleConsumer sink); > } > > @FunctionalInterface > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > @FunctionalInterface > interface OfLongToLong { > void explodeInto(long element, LongConsumer sink); > } > > @FunctionalInterface > interface OfDoubleToDouble { > void explodeInto(double element, DoubleConsumer sink); > } > } > From brian.goetz at oracle.com Tue Feb 12 10:59:58 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 12 Feb 2013 13:59:58 -0500 Subject: FlatMapper In-Reply-To: <6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com> References: <511A8CEF.8070800@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com> Message-ID: <511A912E.1040205@oracle.com> > Are we going to have a consistency issue with FlatMapper vs. > Function? For instance we have ToIntFunction, but not > ToIntFlatMapper. Instead we have FlatMapper.ToInt. I think that's a function of where it lands, which is open for discussion. Currently it is in java.util.stream, where the dominant convention is Foo.OfBar. If we moved it to java.util.function, we'd "flatten" the namespace. I currently lean towards JUS, since this does not seem as important a top-level type as Function, Predicate, or Supplier. But such decisions often turn around to bite one. From zhong.j.yu at gmail.com Tue Feb 12 11:19:18 2013 From: zhong.j.yu at gmail.com (Zhong Yu) Date: Tue, 12 Feb 2013 13:19:18 -0600 Subject: FlatMapper In-Reply-To: <511A8CEF.8070800@oracle.com> References: <511A8CEF.8070800@oracle.com> Message-ID: One common use case is to map zero or more elements in stream A into one element in stream B. People can, and will, use flatMap(FlatMapper) to achieve that (with a side effecting mapper), even though it is the opposite of what "flat map" was known for. I think the method could use a more general name which is appropriate for both explode/implode. Another use case is to aggregate *some* (not all) elements of a stream to produce a result; it can be done by stream.flatMap(FlatMapper).findFirst(). If this use case is common enough it deserves a standalone method, say aggregatePartially(FlatMapper). Now if FlatMapper is needed in other places too, it could use a more general name; after all, it is pretty much a normal function, except it inserts its result in a sink instead of returning it. Zhong Yu On Tue, Feb 12, 2013 at 12:41 PM, Brian Goetz wrote: > Here's where things have currently landed with FlatMapper -- this is a type > in java.util.stream, with nested specializations. > > Full bikeshed season is now open. Are we OK with the name explodeInto()? > Is this general enough to join the ranks of Function and Supplier as > top-level types in java.util.function? > > @FunctionalInterface > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > @FunctionalInterface > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > @FunctionalInterface > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > @FunctionalInterface > interface ToDouble { > void explodeInto(T element, DoubleConsumer sink); > } > > @FunctionalInterface > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > @FunctionalInterface > interface OfLongToLong { > void explodeInto(long element, LongConsumer sink); > } > > @FunctionalInterface > interface OfDoubleToDouble { > void explodeInto(double element, DoubleConsumer sink); > } > } From forax at univ-mlv.fr Tue Feb 12 11:17:46 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 12 Feb 2013 20:17:46 +0100 Subject: FlatMapper In-Reply-To: <511A912E.1040205@oracle.com> References: <511A8CEF.8070800@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com> <511A912E.1040205@oracle.com> Message-ID: <511A955A.8050208@univ-mlv.fr> On 02/12/2013 07:59 PM, Brian Goetz wrote: >> Are we going to have a consistency issue with FlatMapper vs. >> Function? For instance we have ToIntFunction, but not >> ToIntFlatMapper. Instead we have FlatMapper.ToInt. > > > I think that's a function of where it lands, which is open for > discussion. Currently it is in java.util.stream, where the dominant > convention is Foo.OfBar. If we moved it to java.util.function, we'd > "flatten" the namespace. > > I currently lean towards JUS, since this does not seem as important a > top-level type as Function, Predicate, or Supplier. But such decisions > often turn around to bite one. > I've already said this, but nobody cares, there is a usability issue with FlatMapper.ToInt, the very same that the one that was reported more than 10 years ago when java.awt.geom introduces classes like Point2D.Double [1][2]. cheers, R?mi [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4198349 [2] https://forums.oracle.com/forums/thread.jspa?threadID=1665781 From forax at univ-mlv.fr Tue Feb 12 11:45:23 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 12 Feb 2013 20:45:23 +0100 Subject: FlatMapper In-Reply-To: <738F0591-0F5F-443B-8C76-5A5B69556A2B@gmail.com> References: <511A8CEF.8070800@oracle.com> <6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com> <511A912E.1040205@oracle.com> <511A955A.8050208@univ-mlv.fr> <738F0591-0F5F-443B-8C76-5A5B69556A2B@gmail.com> Message-ID: <511A9BD3.8070006@univ-mlv.fr> On 02/12/2013 08:27 PM, Sam Pullara wrote: > I don't get it. Why not just avoid importing Point2D.Double? This works fine: > > package spullara; > > import java.awt.geom.Point2D; > > public class Test { > public static void main(String[] args) { > Point2D.Double p = new Point2D.Double(10.0, 20.0); > double d = Double.parseDouble("123.45"); > } > } > > Sam yes, just ... but who read imports these days. R?mi > > On Feb 12, 2013, at 11:17 AM, Remi Forax wrote: > >> On 02/12/2013 07:59 PM, Brian Goetz wrote: >>>> Are we going to have a consistency issue with FlatMapper vs. >>>> Function? For instance we have ToIntFunction, but not >>>> ToIntFlatMapper. Instead we have FlatMapper.ToInt. >>> >>> I think that's a function of where it lands, which is open for discussion. Currently it is in java.util.stream, where the dominant convention is Foo.OfBar. If we moved it to java.util.function, we'd "flatten" the namespace. >>> >>> I currently lean towards JUS, since this does not seem as important a top-level type as Function, Predicate, or Supplier. But such decisions often turn around to bite one. >>> >> I've already said this, but nobody cares, >> there is a usability issue with FlatMapper.ToInt, the very same that the one that was reported more than 10 years ago when java.awt.geom introduces classes like Point2D.Double [1][2]. >> >> cheers, >> R?mi >> [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4198349 >> [2] https://forums.oracle.com/forums/thread.jspa?threadID=1665781 >> From forax at univ-mlv.fr Thu Feb 14 07:53:23 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 14 Feb 2013 16:53:23 +0100 Subject: A small JSON parsing library Message-ID: <511D0873.4000205@univ-mlv.fr> Hi all, just to see how it goes, I've written a small all-in-one-file library that parses a JSON files to an object tree defined by the user. The mapping is specified using lambdas, so it's compact but can burn your eyes :) https://github.com/forax/jsonjedi The idea is that the JSON parser is triggered by the consumption of the corresponding stream, so when the code do by example a forEach on a Stream, the parser parse the corresponding objects. I've used an already existing push parser named json-simple for that. During the development, I've found two main gotchas, the first one is the scope rules of the lambda parameter, I've already sent message about this rule, it seems that eah time I write a page of code, the compiler stop me because I tend to re-use the same variable name for the very same object. In the example named Big [1], the builder of JSON schema is used recursively but I've to use a different names each time (builder, builder2, builder3). We should really remove this stupid rule from the JLS and go back to the classical shadowing rules. The second problem is that the implementation uses method handles to but due to the poor integration betweeen method handles and lambdas, I've 20 lines of boilerplate and error prone code [2] which is moreover executed too eagerly. cheers, R?mi [1] https://github.com/forax/jsonjedi/blob/master/src/Big.java [2] https://github.com/forax/jsonjedi/blob/master/src/jsonjedi/JSONSchemaBuilder.java#L358 From brian.goetz at oracle.com Thu Feb 14 08:20:42 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 14 Feb 2013 11:20:42 -0500 Subject: A small JSON parsing library In-Reply-To: <511D0873.4000205@univ-mlv.fr> References: <511D0873.4000205@univ-mlv.fr> Message-ID: <511D0EDA.4000106@oracle.com> > just to see how it goes, I've written a small all-in-one-file library > that parses a JSON files to an object tree defined by the user. The > mapping is specified using lambdas, so it's compact but can burn your > eyes :) > > https://github.com/forax/jsonjedi Very cool! > During the development, I've found two main gotchas, > the first one is the scope rules of the lambda parameter, > I've already sent message about this rule, it seems that eah time I > write a page of code, the compiler stop me because I tend to re-use the > same variable name for the very same object. > In the example named Big [1], the builder of JSON schema is used > recursively but I've to use a different names each time (builder, > builder2, builder3). I can see how this would be annoying to write. But as a reader, I really prefer it! If all the variables were called "builder", I would be confused. Much easier if I know exactly which builder you are referring to -- especially since the declaration -- "builder -> { ..." -- often starts near the right margin. > We should really remove this stupid rule from the JLS and go back to the > classical shadowing rules. Your Big.java provides an excellent example why this rule is great! :) From spullara at gmail.com Thu Feb 14 08:45:09 2013 From: spullara at gmail.com (Sam Pullara) Date: Thu, 14 Feb 2013 08:45:09 -0800 Subject: A small JSON parsing library In-Reply-To: <511D0873.4000205@univ-mlv.fr> References: <511D0873.4000205@univ-mlv.fr> Message-ID: This rule has also been a pain for me. Since naming is one of the hardest things in computer science we shouldn't make it any harder. Sam On Feb 14, 2013 8:12 AM, "Remi Forax" wrote: > Hi all, > just to see how it goes, I've written a small all-in-one-file library that > parses a JSON files to an object tree defined by the user. The mapping is > specified using lambdas, so it's compact but can burn your eyes :) > > https://github.com/forax/**jsonjedi > > The idea is that the JSON parser is triggered by the consumption of the > corresponding stream, > so when the code do by example a forEach on a Stream, the parser parse the > corresponding objects. > I've used an already existing push parser named json-simple for that. > > During the development, I've found two main gotchas, > the first one is the scope rules of the lambda parameter, > I've already sent message about this rule, it seems that eah time I write > a page of code, the compiler stop me because I tend to re-use the same > variable name for the very same object. > In the example named Big [1], the builder of JSON schema is used > recursively but I've to use a different names each time (builder, builder2, > builder3). > We should really remove this stupid rule from the JLS and go back to the > classical shadowing rules. > > The second problem is that the implementation uses method handles to but > due to the poor integration betweeen method handles and lambdas, I've 20 > lines of boilerplate and error prone code [2] which is moreover executed > too eagerly. > > cheers, > R?mi > > [1] https://github.com/forax/**jsonjedi/blob/master/src/Big.**java > [2] https://github.com/forax/**jsonjedi/blob/master/src/** > jsonjedi/JSONSchemaBuilder.**java#L358 > > From maurizio.cimadamore at oracle.com Thu Feb 14 09:15:37 2013 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 14 Feb 2013 17:15:37 +0000 Subject: A small JSON parsing library In-Reply-To: <511D0EDA.4000106@oracle.com> References: <511D0873.4000205@univ-mlv.fr> <511D0EDA.4000106@oracle.com> Message-ID: <511D1BB9.6040003@oracle.com> My eyes are still burning ;-) Very please with the total absence of type witnesses whatsoever. Maurizio On 14/02/13 16:20, Brian Goetz wrote: >> just to see how it goes, I've written a small all-in-one-file library >> that parses a JSON files to an object tree defined by the user. The >> mapping is specified using lambdas, so it's compact but can burn your >> eyes :) >> >> https://github.com/forax/jsonjedi > > Very cool! > >> During the development, I've found two main gotchas, >> the first one is the scope rules of the lambda parameter, >> I've already sent message about this rule, it seems that eah time I >> write a page of code, the compiler stop me because I tend to re-use the >> same variable name for the very same object. >> In the example named Big [1], the builder of JSON schema is used >> recursively but I've to use a different names each time (builder, >> builder2, builder3). > > I can see how this would be annoying to write. But as a reader, I > really prefer it! If all the variables were called "builder", I would > be confused. Much easier if I know exactly which builder you are > referring to -- especially since the declaration -- "builder -> { ..." > -- often starts near the right margin. > >> We should really remove this stupid rule from the JLS and go back to the >> classical shadowing rules. > > Your Big.java provides an excellent example why this rule is great! :) > > From spullara at gmail.com Thu Feb 14 09:36:03 2013 From: spullara at gmail.com (Sam Pullara) Date: Thu, 14 Feb 2013 09:36:03 -0800 Subject: A small JSON parsing library In-Reply-To: <511D0EDA.4000106@oracle.com> References: <511D0873.4000205@univ-mlv.fr> <511D0EDA.4000106@oracle.com> Message-ID: On Feb 14, 2013, at 8:20 AM, Brian Goetz > We should really remove this stupid rule from the JLS and go back to the >> classical shadowing rules. > > Your Big.java provides an excellent example why this rule is great! :) I think the opposite. All those builders are the same object. I'd like the freedom to name them the same. Sam From david.lloyd at redhat.com Thu Feb 14 10:48:57 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Thu, 14 Feb 2013 12:48:57 -0600 Subject: A small JSON parsing library In-Reply-To: References: <511D0873.4000205@univ-mlv.fr> <511D0EDA.4000106@oracle.com> Message-ID: <511D3199.3080800@redhat.com> On 02/14/2013 11:36 AM, Sam Pullara wrote: > > On Feb 14, 2013, at 8:20 AM, Brian Goetz >> We should really remove this stupid rule from the JLS and go back to the >>> classical shadowing rules. >> >> Your Big.java provides an excellent example why this rule is great! :) > > I think the opposite. All those builders are the same object. I'd like the freedom to name them the same. I agree, it should be up to the user. -- - DML From forax at univ-mlv.fr Thu Feb 14 12:10:06 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 14 Feb 2013 21:10:06 +0100 Subject: A small JSON parsing library In-Reply-To: References: <511D0873.4000205@univ-mlv.fr> <511D0EDA.4000106@oracle.com> Message-ID: <511D449E.9080103@univ-mlv.fr> On 02/14/2013 06:36 PM, Sam Pullara wrote: > On Feb 14, 2013, at 8:20 AM, Brian Goetz >> We should really remove this stupid rule from the JLS and go back to the >>> classical shadowing rules. >> Your Big.java provides an excellent example why this rule is great! :) > I think the opposite. All those builders are the same object. I'd like the freedom to name them the same. > > Sam > > It's also annoying when you do a filter then map like pathes.stream().filter(path -> path != null).map(path -> path.getFileName().toString()); again, here, you want to use the same name, because it's the same thing. R?mi From brian.goetz at oracle.com Thu Feb 14 12:56:00 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 14 Feb 2013 15:56:00 -0500 Subject: FlatMapper In-Reply-To: <511A8CEF.8070800@oracle.com> References: <511A8CEF.8070800@oracle.com> Message-ID: <511D4F60.4040407@oracle.com> OK, so far we have: - Joe asks for a better method name -- no suggestions other than mapAndFlatten - No consensus on whether this goes into JUS or JUF. On 2/12/2013 1:41 PM, Brian Goetz wrote: > Here's where things have currently landed with FlatMapper -- this is a > type in java.util.stream, with nested specializations. > > Full bikeshed season is now open. Are we OK with the name > explodeInto()? Is this general enough to join the ranks of Function and > Supplier as top-level types in java.util.function? > > @FunctionalInterface > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > @FunctionalInterface > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > @FunctionalInterface > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > @FunctionalInterface > interface ToDouble { > void explodeInto(T element, DoubleConsumer sink); > } > > @FunctionalInterface > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > @FunctionalInterface > interface OfLongToLong { > void explodeInto(long element, LongConsumer sink); > } > > @FunctionalInterface > interface OfDoubleToDouble { > void explodeInto(double element, DoubleConsumer sink); > } > } From joe.bowbeer at gmail.com Thu Feb 14 13:33:35 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 14 Feb 2013 13:33:35 -0800 Subject: FlatMapper In-Reply-To: <511D4F60.4040407@oracle.com> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> Message-ID: I'm not opposed to explode, but I think it would be better to find a verb that is related to flattening. Extrude is better in that regard than explode. extrudeInto mapAndExtrude On the downside, 'explode' has more hacker cred than 'extrude'. --Joe On Thu, Feb 14, 2013 at 12:56 PM, Brian Goetz wrote: > OK, so far we have: > - Joe asks for a better method name -- no suggestions other than > mapAndFlatten > - No consensus on whether this goes into JUS or JUF. > > > > > On 2/12/2013 1:41 PM, Brian Goetz wrote: > >> Here's where things have currently landed with FlatMapper -- this is a >> type in java.util.stream, with nested specializations. >> >> Full bikeshed season is now open. Are we OK with the name >> explodeInto()? Is this general enough to join the ranks of Function and >> Supplier as top-level types in java.util.function? >> >> @FunctionalInterface >> public interface FlatMapper { >> void explodeInto(T element, Consumer sink); >> >> @FunctionalInterface >> interface ToInt { >> void explodeInto(T element, IntConsumer sink); >> } >> >> @FunctionalInterface >> interface ToLong { >> void explodeInto(T element, LongConsumer sink); >> } >> >> @FunctionalInterface >> interface ToDouble { >> void explodeInto(T element, DoubleConsumer sink); >> } >> >> @FunctionalInterface >> interface OfIntToInt { >> void explodeInto(int element, IntConsumer sink); >> } >> >> @FunctionalInterface >> interface OfLongToLong { >> void explodeInto(long element, LongConsumer sink); >> } >> >> @FunctionalInterface >> interface OfDoubleToDouble { >> void explodeInto(double element, DoubleConsumer sink); >> } >> } >> > From brian.goetz at oracle.com Fri Feb 15 10:48:24 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 15 Feb 2013 13:48:24 -0500 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> Message-ID: <511E82F8.1060509@oracle.com> So it seems the choice is: - Keep this tied to flatMap and keep it in JUS. Advantage: makes the complicated flatMap(FlatMapper) operation easier to understand. - Abstract this into a general "map to multiple values and dump results into a Consumer" type, move to JUF, ane rename to something like "MultiFunction". Advantage: more future flexibility; Disadvantage: mostly guessing about what we might want in the future. I lean towards the first. In which case the remaining decision is: what to name the method. Maybe: mapAndFlattenInto ? On 2/14/2013 4:33 PM, Joe Bowbeer wrote: > I'm not opposed to explode, but I think it would be better to find a > verb that is related to flattening. Extrude is better in that regard > than explode. > > extrudeInto > mapAndExtrude > > On the downside, 'explode' has more hacker cred than 'extrude'. > > --Joe > > > On Thu, Feb 14, 2013 at 12:56 PM, Brian Goetz > wrote: > > OK, so far we have: > - Joe asks for a better method name -- no suggestions other than > mapAndFlatten > - No consensus on whether this goes into JUS or JUF. > > > > > On 2/12/2013 1:41 PM, Brian Goetz wrote: > > Here's where things have currently landed with FlatMapper -- > this is a > type in java.util.stream, with nested specializations. > > Full bikeshed season is now open. Are we OK with the name > explodeInto()? Is this general enough to join the ranks of > Function and > Supplier as top-level types in java.util.function? > > @FunctionalInterface > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > @FunctionalInterface > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > @FunctionalInterface > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > @FunctionalInterface > interface ToDouble { > void explodeInto(T element, DoubleConsumer sink); > } > > @FunctionalInterface > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > @FunctionalInterface > interface OfLongToLong { > void explodeInto(long element, LongConsumer sink); > } > > @FunctionalInterface > interface OfDoubleToDouble { > void explodeInto(double element, DoubleConsumer sink); > } > } > > From brian.goetz at oracle.com Fri Feb 15 12:04:14 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 15 Feb 2013 15:04:14 -0500 Subject: Characterizing stream operation Message-ID: <511E94BE.1010901@oracle.com> We've divided stream operations as follows: Intermediate operations. Always lazy. Always produce another stream. Stateful operations. A kind of intermediate operation. Currently always transforms to the same stream type (e.g., Stream to Stream), though this could conceivably change (we haven't found any, though). Must provide their own parallel implementation. Parallel pipelines containing stateful operations are implicitly "sliced" into segments on stateful operation boundaries, and executed in segments. Terminal operations. The only thing that kicks off stream computation. Produces a non-stream result (value or side-effects.) For each of these, once you perform an operation on a stream (intermediate or terminal), the stream is *consumed* and no more operations can be performed on that stream. (Not entirely true, as the TCK team will almost certainly point out to us eventually; there are some ops that are no-ops and probably will succeed unless we add consumed checks.) These names are fine from the perspective of the implementation; when implementing an operation, you will be implementing one of these three types, and conveniently there is a base type for each to subclass. From the user perspective, though, they may not be as helpful as some alternative taxonomies, such as: - lazy operation -- what we now call intermediate operation - stateful lazy operation -- what we now call stateful - consuming operation -- what we now call terminal These are good in that they keep a key characteristic -- when the computation happens -- in full view. However, they also create less clean boundaries. For example, iterator() is a consuming operation from the perspective of the stream, but from the perspective of the user, may be thought of as lazy. Thoughts on how to adjust this naming to be more intuitive to users? From forax at univ-mlv.fr Fri Feb 15 15:46:08 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 16 Feb 2013 00:46:08 +0100 Subject: Characterizing stream operation In-Reply-To: <511E94BE.1010901@oracle.com> References: <511E94BE.1010901@oracle.com> Message-ID: <511EC8C0.3010507@univ-mlv.fr> On 02/15/2013 09:04 PM, Brian Goetz wrote: > We've divided stream operations as follows: > > Intermediate operations. Always lazy. Always produce another stream. > > Stateful operations. A kind of intermediate operation. Currently > always transforms to the same stream type (e.g., Stream to > Stream), though this could conceivably change (we haven't found > any, though). Must provide their own parallel implementation. > Parallel pipelines containing stateful operations are implicitly > "sliced" into segments on stateful operation boundaries, and executed > in segments. > > Terminal operations. The only thing that kicks off stream > computation. Produces a non-stream result (value or side-effects.) > > For each of these, once you perform an operation on a stream > (intermediate or terminal), the stream is *consumed* and no more > operations can be performed on that stream. (Not entirely true, as > the TCK team will almost certainly point out to us eventually; there > are some ops that are no-ops and probably will succeed unless we add > consumed checks.) > > > These names are fine from the perspective of the implementation; when > implementing an operation, you will be implementing one of these three > types, and conveniently there is a base type for each to subclass. > > From the user perspective, though, they may not be as helpful as some > alternative taxonomies, such as: > > - lazy operation -- what we now call intermediate operation > - stateful lazy operation -- what we now call stateful > - consuming operation -- what we now call terminal > > These are good in that they keep a key characteristic -- when the > computation happens -- in full view. However, they also create less > clean boundaries. For example, iterator() is a consuming operation > from the perspective of the stream, but from the perspective of the > user, may be thought of as lazy. > > Thoughts on how to adjust this naming to be more intuitive to users? > lazy and terminal are Ok for me, stateful can be renamed to intermediate stateful. R?mi From joe.bowbeer at gmail.com Sat Feb 16 21:31:19 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 16 Feb 2013 21:31:19 -0800 Subject: FlatMapper In-Reply-To: <511E82F8.1060509@oracle.com> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> Message-ID: > mapAndFlattenInto ? OK On Fri, Feb 15, 2013 at 10:48 AM, Brian Goetz wrote: > So it seems the choice is: > > - Keep this tied to flatMap and keep it in JUS. Advantage: makes the > complicated flatMap(FlatMapper) operation easier to understand. > > - Abstract this into a general "map to multiple values and dump results > into a Consumer" type, move to JUF, ane rename to something like > "MultiFunction". Advantage: more future flexibility; Disadvantage: mostly > guessing about what we might want in the future. > > I lean towards the first. > > In which case the remaining decision is: what to name the method. > > Maybe: > > mapAndFlattenInto > > ? > > > On 2/14/2013 4:33 PM, Joe Bowbeer wrote: > >> I'm not opposed to explode, but I think it would be better to find a >> verb that is related to flattening. Extrude is better in that regard >> than explode. >> >> extrudeInto >> mapAndExtrude >> >> On the downside, 'explode' has more hacker cred than 'extrude'. >> >> --Joe >> >> >> On Thu, Feb 14, 2013 at 12:56 PM, Brian Goetz > > wrote: >> >> OK, so far we have: >> - Joe asks for a better method name -- no suggestions other than >> mapAndFlatten >> - No consensus on whether this goes into JUS or JUF. >> >> >> >> >> On 2/12/2013 1:41 PM, Brian Goetz wrote: >> >> Here's where things have currently landed with FlatMapper -- >> this is a >> type in java.util.stream, with nested specializations. >> >> Full bikeshed season is now open. Are we OK with the name >> explodeInto()? Is this general enough to join the ranks of >> Function and >> Supplier as top-level types in java.util.function? >> >> @FunctionalInterface >> public interface FlatMapper { >> void explodeInto(T element, Consumer sink); >> >> @FunctionalInterface >> interface ToInt { >> void explodeInto(T element, IntConsumer sink); >> } >> >> @FunctionalInterface >> interface ToLong { >> void explodeInto(T element, LongConsumer sink); >> } >> >> @FunctionalInterface >> interface ToDouble { >> void explodeInto(T element, DoubleConsumer sink); >> } >> >> @FunctionalInterface >> interface OfIntToInt { >> void explodeInto(int element, IntConsumer sink); >> } >> >> @FunctionalInterface >> interface OfLongToLong { >> void explodeInto(long element, LongConsumer sink); >> } >> >> @FunctionalInterface >> interface OfDoubleToDouble { >> void explodeInto(double element, DoubleConsumer sink); >> } >> } >> >> >> From forax at univ-mlv.fr Sun Feb 17 03:17:06 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 17 Feb 2013 12:17:06 +0100 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> Message-ID: <5120BC32.3000309@univ-mlv.fr> On 02/17/2013 06:31 AM, Joe Bowbeer wrote: > > mapAndFlattenInto ? > > OK I really like explode just because we will see comments on mailing list saying that someone have a problem when they try to explode :) mapAndFlattenInto is a little to verbose for me, mapAndFlat ? R?mi > > > On Fri, Feb 15, 2013 at 10:48 AM, Brian Goetz > wrote: > > So it seems the choice is: > > - Keep this tied to flatMap and keep it in JUS. Advantage: makes > the complicated flatMap(FlatMapper) operation easier to understand. > > - Abstract this into a general "map to multiple values and dump > results into a Consumer" type, move to JUF, ane rename to > something like "MultiFunction". Advantage: more future > flexibility; Disadvantage: mostly guessing about what we might > want in the future. > > I lean towards the first. > > In which case the remaining decision is: what to name the method. > > Maybe: > > mapAndFlattenInto > > ? > > > On 2/14/2013 4:33 PM, Joe Bowbeer wrote: > > I'm not opposed to explode, but I think it would be better to > find a > verb that is related to flattening. Extrude is better in that > regard > than explode. > > extrudeInto > mapAndExtrude > > On the downside, 'explode' has more hacker cred than 'extrude'. > > --Joe > > > On Thu, Feb 14, 2013 at 12:56 PM, Brian Goetz > > >> wrote: > > OK, so far we have: > - Joe asks for a better method name -- no suggestions > other than > mapAndFlatten > - No consensus on whether this goes into JUS or JUF. > > > > > On 2/12/2013 1:41 PM, Brian Goetz wrote: > > Here's where things have currently landed with > FlatMapper -- > this is a > type in java.util.stream, with nested specializations. > > Full bikeshed season is now open. Are we OK with the name > explodeInto()? Is this general enough to join the > ranks of > Function and > Supplier as top-level types in java.util.function? > > @FunctionalInterface > public interface FlatMapper { > void explodeInto(T element, Consumer sink); > > @FunctionalInterface > interface ToInt { > void explodeInto(T element, IntConsumer sink); > } > > @FunctionalInterface > interface ToLong { > void explodeInto(T element, LongConsumer sink); > } > > @FunctionalInterface > interface ToDouble { > void explodeInto(T element, DoubleConsumer > sink); > } > > @FunctionalInterface > interface OfIntToInt { > void explodeInto(int element, IntConsumer sink); > } > > @FunctionalInterface > interface OfLongToLong { > void explodeInto(long element, LongConsumer > sink); > } > > @FunctionalInterface > interface OfDoubleToDouble { > void explodeInto(double element, > DoubleConsumer sink); > } > } > > > From tim at peierls.net Sun Feb 17 06:36:38 2013 From: tim at peierls.net (Tim Peierls) Date: Sun, 17 Feb 2013 09:36:38 -0500 Subject: FlatMapper In-Reply-To: <5120BC32.3000309@univ-mlv.fr> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> Message-ID: On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax wrote: > mapAndFlattenInto is a little to verbose for me, mapAndFlat ? > No, has to be a verb. I'd still understand flattenInto, leaving the mapping part to be implied by the type name. --tim From dl at cs.oswego.edu Sun Feb 17 06:55:37 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 17 Feb 2013 09:55:37 -0500 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> Message-ID: <5120EF69.2020105@cs.oswego.edu> On 02/17/13 09:36, Tim Peierls wrote: > On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax > wrote: > > mapAndFlattenInto is a little to verbose for me, mapAndFlat ? > > > No, has to be a verb. > > I'd still understand flattenInto, leaving the mapping part to be implied by the > type name. > Deja vu all over again. Yeah, flattenInto seems fine. A similarly fused map-reduce just called reduce would also have been fine. More than fine... -Doug From brian.goetz at oracle.com Sun Feb 17 11:07:59 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 17 Feb 2013 14:07:59 -0500 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> Message-ID: <51212A8F.7000109@oracle.com> flattenInto seems the best so far. On 2/17/2013 9:36 AM, Tim Peierls wrote: > On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax > wrote: > > mapAndFlattenInto is a little to verbose for me, mapAndFlat ? > > > No, has to be a verb. > > I'd still understand flattenInto, leaving the mapping part to be implied > by the type name. > > --tim From forax at univ-mlv.fr Sun Feb 17 11:06:12 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 17 Feb 2013 20:06:12 +0100 Subject: FlatMapper In-Reply-To: <51212A8F.7000109@oracle.com> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> Message-ID: <51212A24.2020101@univ-mlv.fr> On 02/17/2013 08:07 PM, Brian Goetz wrote: > flattenInto seems the best so far. +1 R?mi > > On 2/17/2013 9:36 AM, Tim Peierls wrote: >> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax > > wrote: >> >> mapAndFlattenInto is a little to verbose for me, mapAndFlat ? >> >> >> No, has to be a verb. >> >> I'd still understand flattenInto, leaving the mapping part to be implied >> by the type name. >> >> --tim From joe.bowbeer at gmail.com Sun Feb 17 12:29:25 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sun, 17 Feb 2013 12:29:25 -0800 Subject: FlatMapper In-Reply-To: <51212A24.2020101@univ-mlv.fr> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> Message-ID: flattenInto gets my vote On Feb 17, 2013 11:09 AM, "Remi Forax" wrote: > On 02/17/2013 08:07 PM, Brian Goetz wrote: > >> flattenInto seems the best so far. >> > > +1 > > R?mi > > >> On 2/17/2013 9:36 AM, Tim Peierls wrote: >> >>> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax >> > wrote: >>> >>> mapAndFlattenInto is a little to verbose for me, mapAndFlat ? >>> >>> >>> No, has to be a verb. >>> >>> I'd still understand flattenInto, leaving the mapping part to be implied >>> by the type name. >>> >>> --tim >>> >> > From brian.goetz at oracle.com Mon Feb 18 12:20:23 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 18 Feb 2013 15:20:23 -0500 Subject: Reducing reduce In-Reply-To: <51191F3D.4090203@oracle.com> References: <51191F3D.4090203@oracle.com> Message-ID: <51228D07.9060004@oracle.com> Circling back to this (i.e., "reducing reduce", redux): There are a lot of considerations here, many mostly accidental (e.g., consequences of erasure and the primitive/reference divide). The three-arg functional reduce form is functionally equivalent to the two-arg form, except that there are some constructions that are more efficient to handle in the three arg form. However, the best example we came up with, Joe's string compare, suffers because he had to use boxing. So we're currently in a place where the best example to support this form has other defects that make the form hard to support. And, any form of functional reduce on a reference would likely result in a lot of object creation, so the optimization of eliding some of the mapping would have to overcome that. Further, one can still handle this without boxing using collect() and an explicit mutable result holder. On the other hand, if/when the language acquires tuples, it will be a very different story, and this form would become infinitely more useful. So I think the evidence weighs slightly in favor of ditching this form for now (though I'd feel better if people didn't have to use either an ad-hoc class or a single-element array as the data box when using the collect() form.) Secondarily, ditching the three-arg form from Stream would remove one element of support for naming reduce and collect differently; part of the motivation for a different name was that the three-arg collect and three-arg reduce overloaded very poorly if they had the same name. However, I think we should resist the temptation to act on this. I think (a) there is pedagogical value in separating function and mutable reduce forms, and (b) if we do this, we slam the door on the more flexible version, which will badly bite us in a tupled future. We might still consider the three-arg version for IntStream. That's the case where Joe's example works. On 2/11/2013 11:41 AM, Brian Goetz wrote: > Now that we've added all the shapes of map() to Stream (map to > ref/int/long/double), and we've separated functional reduce (currently > called reduce) from mutable reduce (currently called collect), I think > that leaves room for taking out one of the reduce methods from Stream: > > U reduce(U identity, > BiFunction accumulator, > BinaryOperator reducer); > > This is the one that confuses everyone anyway, and I don't think we need > it any more. > > The argument for having this form instead of discrete map+reduce are: > - fused map+reduce reduces boxing > - this three-arg form can also fold filtering into the accumulation > > However, since we now have primitive-bearing map methods, and we can do > filtering before and after the map, is this form really carrying its > weight? Specifically because people find it counterintuitive, we should > consider dropping it and guiding people towards map+reduce. > > For example, "sum of pages" over a stream of Documents is better written > as: > > docs.map(Document::getPageCount).sum() > > rather than > > docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum) > > The big place where we need three-arg reduce is when we're folding into > a mutable store. But that's now handled by collect(). > > Have I missed any use cases that would justify keeping this form? > From brian.goetz at oracle.com Mon Feb 18 12:50:52 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 18 Feb 2013 15:50:52 -0500 Subject: forEachUntil Message-ID: <5122942C.7000600@oracle.com> Based on further user feedback, I think the name forEachUntil is too confusing; it makes people (including some members of this expert group) think that it is supposed to be an encounter-based limiting operation, rather than an externally-based cancelling operation. Until seems to be inextricably linked in people's minds to encounter order, with all the attendant confusion. People seem more able to understand cancellation, and in particular to understand that cancellation is usually a cooperative, best-efforts thing rather than the deterministic content-based limiting that people have in mind. Accordingly, I think we should rename to "forEachWithCancel", which is more suggestive (and, secondarily, the ugly name subtly reinforces that it serves uncommon use cases.) From tim at peierls.net Mon Feb 18 13:29:51 2013 From: tim at peierls.net (Tim Peierls) Date: Mon, 18 Feb 2013 16:29:51 -0500 Subject: forEachUntil In-Reply-To: <5122942C.7000600@oracle.com> References: <5122942C.7000600@oracle.com> Message-ID: Overloading forEach isn't possible? I don't think extra uglification beyond including a canCancel argument is needed to reinforce the uncommonness of the usage. --tim On Mon, Feb 18, 2013 at 3:50 PM, Brian Goetz wrote: > Based on further user feedback, I think the name forEachUntil is too > confusing; it makes people (including some members of this expert group) > think that it is supposed to be an encounter-based limiting operation, > rather than an externally-based cancelling operation. Until seems to be > inextricably linked in people's minds to encounter order, with all the > attendant confusion. People seem more able to understand cancellation, and > in particular to understand that cancellation is usually a cooperative, > best-efforts thing rather than the deterministic content-based limiting > that people have in mind. > > Accordingly, I think we should rename to "forEachWithCancel", which is > more suggestive (and, secondarily, the ugly name subtly reinforces that it > serves uncommon use cases.) > From brian.goetz at oracle.com Mon Feb 18 13:32:01 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 18 Feb 2013 16:32:01 -0500 Subject: forEachUntil In-Reply-To: References: <5122942C.7000600@oracle.com> Message-ID: <51229DD1.9000904@oracle.com> Overloading forEach is certainly possible. However, I think that it may well be subject to the same "this is not the method you are looking for" confusion as forEachUntil was (though is probably slightly better in this way.) On 2/18/2013 4:29 PM, Tim Peierls wrote: > Overloading forEach isn't possible? I don't think extra uglification > beyond including a canCancel argument is needed to reinforce the > uncommonness of the usage. > > --tim > > On Mon, Feb 18, 2013 at 3:50 PM, Brian Goetz > wrote: > > Based on further user feedback, I think the name forEachUntil is too > confusing; it makes people (including some members of this expert > group) think that it is supposed to be an encounter-based limiting > operation, rather than an externally-based cancelling operation. > Until seems to be inextricably linked in people's minds to > encounter order, with all the attendant confusion. People seem more > able to understand cancellation, and in particular to understand > that cancellation is usually a cooperative, best-efforts thing > rather than the deterministic content-based limiting that people > have in mind. > > Accordingly, I think we should rename to "forEachWithCancel", which > is more suggestive (and, secondarily, the ugly name subtly > reinforces that it serves uncommon use cases.) > > From joe.bowbeer at gmail.com Mon Feb 18 15:16:38 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 18 Feb 2013 15:16:38 -0800 Subject: Reducing reduce In-Reply-To: <51228D07.9060004@oracle.com> References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com> Message-ID: I wouldn't have thought that boxed had anything to do with "our" 3-arg reduce example. Boxed is by-product of my decision to use a primitive generator (intRange). I could have picked a different generator and then I wouldn't have needed boxed(), yet the 3-arg reduce form would be unaffected. There are lots of applications for prefix-sums. Guy Blelloch listed 13 in 1993, and string-compare just happened to be at the top of the list, where I started: http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf I have a couple of related questions, which I think may be raised by others: 1. Why don't we have a 3-arg mapreduce like Guy Steele discusses in his Parallel-Not talks? http://vimeo.com/6624203 (or a map-scan-zip?) 2. Why don't we have a parallel fold (map+combine) like Rich Hickey added to Clojure? http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html --Joe On Mon, Feb 18, 2013 at 12:20 PM, Brian Goetz wrote: > Circling back to this (i.e., "reducing reduce", redux): > > There are a lot of considerations here, many mostly accidental (e.g., > consequences of erasure and the primitive/reference divide). > > The three-arg functional reduce form is functionally equivalent to the > two-arg form, except that there are some constructions that are more > efficient to handle in the three arg form. However, the best example we > came up with, Joe's string compare, suffers because he had to use boxing. > So we're currently in a place where the best example to support this form > has other defects that make the form hard to support. And, any form of > functional reduce on a reference would likely result in a lot of object > creation, so the optimization of eliding some of the mapping would have to > overcome that. > > Further, one can still handle this without boxing using collect() and an > explicit mutable result holder. On the other hand, if/when the language > acquires tuples, it will be a very different story, and this form would > become infinitely more useful. > > So I think the evidence weighs slightly in favor of ditching this form for > now (though I'd feel better if people didn't have to use either an ad-hoc > class or a single-element array as the data box when using the collect() > form.) > > Secondarily, ditching the three-arg form from Stream would remove one > element of support for naming reduce and collect differently; part of the > motivation for a different name was that the three-arg collect and > three-arg reduce overloaded very poorly if they had the same name. However, > I think we should resist the temptation to act on this. I think (a) there > is pedagogical value in separating function and mutable reduce forms, and > (b) if we do this, we slam the door on the more flexible version, which > will badly bite us in a tupled future. > > We might still consider the three-arg version for IntStream. That's the > case where Joe's example works. > > > On 2/11/2013 11:41 AM, Brian Goetz wrote: > >> Now that we've added all the shapes of map() to Stream (map to >> ref/int/long/double), and we've separated functional reduce (currently >> called reduce) from mutable reduce (currently called collect), I think >> that leaves room for taking out one of the reduce methods from Stream: >> >> U reduce(U identity, >> BiFunction accumulator, >> BinaryOperator reducer); >> >> This is the one that confuses everyone anyway, and I don't think we need >> it any more. >> >> The argument for having this form instead of discrete map+reduce are: >> - fused map+reduce reduces boxing >> - this three-arg form can also fold filtering into the accumulation >> >> However, since we now have primitive-bearing map methods, and we can do >> filtering before and after the map, is this form really carrying its >> weight? Specifically because people find it counterintuitive, we should >> consider dropping it and guiding people towards map+reduce. >> >> For example, "sum of pages" over a stream of Documents is better written >> as: >> >> docs.map(Document::**getPageCount).sum() >> >> rather than >> >> docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum) >> >> The big place where we need three-arg reduce is when we're folding into >> a mutable store. But that's now handled by collect(). >> >> Have I missed any use cases that would justify keeping this form? >> >> From brian.goetz at oracle.com Mon Feb 18 15:20:20 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 18 Feb 2013 18:20:20 -0500 Subject: Reducing reduce In-Reply-To: References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com> Message-ID: <5122B734.2030903@oracle.com> > 2. Why don't we have a parallel fold (map+combine) like Rich Hickey > added to Clojure? > > http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html I'm confused -- the 3-arg reduce was directly inspired by Rich's Reducers work? From joe.bowbeer at gmail.com Mon Feb 18 15:34:16 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 18 Feb 2013 15:34:16 -0800 Subject: Reducing reduce In-Reply-To: <5122B734.2030903@oracle.com> References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com> <5122B734.2030903@oracle.com> Message-ID: Maybe I'm confused. Why are you now trying to eliminate it? On Mon, Feb 18, 2013 at 3:20 PM, Brian Goetz wrote: > 2. Why don't we have a parallel fold (map+combine) like Rich Hickey >> added to Clojure? >> >> http://clojure.com/blog/2012/**05/08/reducers-a-library-and-** >> model-for-collection-**processing.html >> > > I'm confused -- the 3-arg reduce was directly inspired by Rich's Reducers > work? > From brian.goetz at oracle.com Mon Feb 18 15:46:04 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 18 Feb 2013 18:46:04 -0500 Subject: Reducing reduce In-Reply-To: References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com> <5122B734.2030903@oracle.com> Message-ID: <5122BD3C.1050308@oracle.com> See earlier posting on "reducing reduce." Two reasons: - People find it confusing -- they want to know why they have to specify two functions that "do the same thing." (Because they are thinking sequentially.) And that confusion then infects all the reduce forms. - The cases where it has an advantage over either map+reduce or collect seem to be somewhat limited. And, due to lots of accidental complexity reasons, often involve a fair amount of object creation overhead, which starts to eat into the potential performance advantage. So through the combination of "few people will use it, but it confuses everyone", it seems a reasonable candidate for pruning. On 2/18/2013 6:34 PM, Joe Bowbeer wrote: > Maybe I'm confused. > > Why are you now trying to eliminate it? > > > > On Mon, Feb 18, 2013 at 3:20 PM, Brian Goetz > wrote: > > 2. Why don't we have a parallel fold (map+combine) like Rich Hickey > added to Clojure? > > http://clojure.com/blog/2012/__05/08/reducers-a-library-and-__model-for-collection-__processing.html > > > > I'm confused -- the 3-arg reduce was directly inspired by Rich's > Reducers work? > > From joe.bowbeer at gmail.com Mon Feb 18 15:54:14 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 18 Feb 2013 15:54:14 -0800 Subject: Reducing reduce In-Reply-To: <5122BD3C.1050308@oracle.com> References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com> <5122B734.2030903@oracle.com> <5122BD3C.1050308@oracle.com> Message-ID: I'm not seeing a good reason not to keep the 3-arg form. I like it for pedagogical reasons. The string-compare is not really a practical example, after all, so I'm not bothered that it uses boxed(). I think there are potential practical uses for the 3-arg form, and its inclusion gives us something to point at when asked by viewers of Guy's talks or users of Clojure. This is not a classic case of YAGNI. The "are going to" case has already been made. The argument that, well, it doesn't work so nicely in Java, will need some more explaining before I buy it. Joe On Mon, Feb 18, 2013 at 3:46 PM, Brian Goetz wrote: > See earlier posting on "reducing reduce." Two reasons: > > - People find it confusing -- they want to know why they have to specify > two functions that "do the same thing." (Because they are thinking > sequentially.) And that confusion then infects all the reduce forms. > > - The cases where it has an advantage over either map+reduce or collect > seem to be somewhat limited. And, due to lots of accidental complexity > reasons, often involve a fair amount of object creation overhead, which > starts to eat into the potential performance advantage. > > So through the combination of "few people will use it, but it confuses > everyone", it seems a reasonable candidate for pruning. > > > > On 2/18/2013 6:34 PM, Joe Bowbeer wrote: > >> Maybe I'm confused. >> >> Why are you now trying to eliminate it? >> >> >> >> On Mon, Feb 18, 2013 at 3:20 PM, Brian Goetz > > wrote: >> >> 2. Why don't we have a parallel fold (map+combine) like Rich >> Hickey >> added to Clojure? >> >> http://clojure.com/blog/2012/_**_05/08/reducers-a-library-and-** >> __model-for-collection-__**processing.html >> >> > model-for-collection-**processing.html >> > >> >> >> I'm confused -- the 3-arg reduce was directly inspired by Rich's >> Reducers work? >> >> >> From Vladimir.Zakharov at gs.com Mon Feb 18 20:06:12 2013 From: Vladimir.Zakharov at gs.com (Zakharov, Vladimir) Date: Mon, 18 Feb 2013 23:06:12 -0500 Subject: forEachUntil In-Reply-To: <5122942C.7000600@oracle.com> References: <5122942C.7000600@oracle.com> Message-ID: Sounds reasonable. "forEachWithCancel", perhaps "forEachUntilCancelled" (either works as it implies an external actor doing the cancellation). The Goldman Sachs Group, Inc. All rights reserved. See http://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest and other terms and conditions relating to this e-mail and your reliance on information contained in it.? This message may contain confidential or privileged information.? If you are not the intended recipient, please advise us immediately and delete this message.? See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication.? If you cannot access these links, please notify us by reply message and we will send the contents to you.? -----Original Message----- From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda-libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Brian Goetz Sent: Monday, February 18, 2013 3:51 PM To: lambda-libs-spec-experts at openjdk.java.net Subject: forEachUntil Based on further user feedback, I think the name forEachUntil is too confusing; it makes people (including some members of this expert group) think that it is supposed to be an encounter-based limiting operation, rather than an externally-based cancelling operation. Until seems to be inextricably linked in people's minds to encounter order, with all the attendant confusion. People seem more able to understand cancellation, and in particular to understand that cancellation is usually a cooperative, best-efforts thing rather than the deterministic content-based limiting that people have in mind. Accordingly, I think we should rename to "forEachWithCancel", which is more suggestive (and, secondarily, the ugly name subtly reinforces that it serves uncommon use cases.) From brian.goetz at oracle.com Thu Feb 21 07:44:24 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Feb 2013 10:44:24 -0500 Subject: Iterable.stream() Message-ID: <512640D8.2020500@oracle.com> Currently we define stream() and parallelStream() on Collection, with defaults like: default Stream stream() { return Streams.stream( () -> Streams.spliterator(iterator(), size(), Spliterator.SIZED), Spliterator.SIZED); } In other words, if a Collection does not override stream(), it gets the stream based on the iterator. It has been suggested that we could move stream/parallelStream() up to Iterable. They could use an almost identical default, except that they don't know the SIZED flag. (The default in Collection would stay, so existing inheritors of the Collection default wouldn't see any difference. (This is why default methods are virtual.)) Several people have asked why not move these to Iterable, since some APIs return "Iterable" as a least-common-denominator aggregate type, and this would allow those APIs to participate in the stream fun. There are also a handful of other types that implement Iterable, such as Path (Iterable) and DirectoryStream (where we'd added an entries() method, but that would just then become stream()). The sole downside is that it creates (yet another) external dependency from java.lang.Iterable -- now to java.util.stream. Thoughts? From kevinb at google.com Thu Feb 21 08:06:52 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 21 Feb 2013 08:06:52 -0800 Subject: Iterable.stream() In-Reply-To: <512640D8.2020500@oracle.com> References: <512640D8.2020500@oracle.com> Message-ID: 1. Yes please. 2. And this time I won't hijack the thread. On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz wrote: > Currently we define stream() and parallelStream() on Collection, with > defaults like: > > default Stream stream() { > return Streams.stream( > () -> Streams.spliterator(iterator()**, size(), > Spliterator.SIZED), > Spliterator.SIZED); > } > > In other words, if a Collection does not override stream(), it gets the > stream based on the iterator. > > It has been suggested that we could move stream/parallelStream() up to > Iterable. They could use an almost identical default, except that they > don't know the SIZED flag. (The default in Collection would stay, so > existing inheritors of the Collection default wouldn't see any difference. > (This is why default methods are virtual.)) > > Several people have asked why not move these to Iterable, since some APIs > return "Iterable" as a least-common-denominator aggregate type, and this > would allow those APIs to participate in the stream fun. There are also a > handful of other types that implement Iterable, such as Path > (Iterable) and DirectoryStream (where we'd added an entries() method, > but that would just then become stream()). > > The sole downside is that it creates (yet another) external dependency > from java.lang.Iterable -- now to java.util.stream. > > Thoughts? > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From joe.bowbeer at gmail.com Thu Feb 21 08:14:14 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 21 Feb 2013 08:14:14 -0800 Subject: Iterable.stream() In-Reply-To: References: <512640D8.2020500@oracle.com> Message-ID: When this question was raised 2 weeks ago, you asked: "" Can we make our best attempt to specify Iterable.stream() better than Iterable.iterator() was? I haven't worked out how to say this yet, but the idea is: - If at all possible to ensure that each call to stream() returns an actual working and independent stream, you really really should do that. - If that's just not possible, the second call to stream() really really should throw ISE. "" Is this something we should address? There was no discussion about this last time. On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" wrote: > 1. Yes please. > 2. And this time I won't hijack the thread. > > > On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz wrote: > >> Currently we define stream() and parallelStream() on Collection, with >> defaults like: >> >> default Stream stream() { >> return Streams.stream( >> () -> Streams.spliterator(iterator()**, size(), >> Spliterator.SIZED), >> Spliterator.SIZED); >> } >> >> In other words, if a Collection does not override stream(), it gets the >> stream based on the iterator. >> >> It has been suggested that we could move stream/parallelStream() up to >> Iterable. They could use an almost identical default, except that they >> don't know the SIZED flag. (The default in Collection would stay, so >> existing inheritors of the Collection default wouldn't see any difference. >> (This is why default methods are virtual.)) >> >> Several people have asked why not move these to Iterable, since some APIs >> return "Iterable" as a least-common-denominator aggregate type, and this >> would allow those APIs to participate in the stream fun. There are also a >> handful of other types that implement Iterable, such as Path >> (Iterable) and DirectoryStream (where we'd added an entries() method, >> but that would just then become stream()). >> >> The sole downside is that it creates (yet another) external dependency >> from java.lang.Iterable -- now to java.util.stream. >> >> Thoughts? >> >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > From brian.goetz at oracle.com Thu Feb 21 08:27:01 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Feb 2013 11:27:01 -0500 Subject: Iterable.stream() In-Reply-To: References: <512640D8.2020500@oracle.com> Message-ID: <51264AD5.7010009@oracle.com> On the other hand, a big argument in favor of this is the simplicity of building our spliterator() on iterator(). Having stream() have different behavior than iterator() would be weird. The iterator() method might do one of: A: give you a fresh iterator every time B: give you the same iterator every time C: throw With the implementation as proposed, the behavior of stream() in these cases would be: A: give you a fresh stream every time B: give you a fresh stream, but which end up sharing the common Iterator C: throw B leads to unpredictable results, but no more nasty than any other case where B happens. (Joe's idea is a good guideline for writing iterator() methods anyway, maybe we should put that into the doc as a suggestion, asking classes that don't behave this way to be polite and document their deviant behavior.) On 2/21/2013 11:14 AM, Joe Bowbeer wrote: > When this question was raised 2 weeks ago, you asked: > > "" > Can we make our best attempt to specify Iterable.stream() better than > Iterable.iterator() was? > > I haven't worked out how to say this yet, but the idea is: > > - If at all possible to ensure that each call to stream() returns an > actual working and independent stream, you really really should do that. > - If that's just not possible, the second call to stream() really really > should throw ISE. > "" > > Is this something we should address? There was no discussion about this > last time. > > On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" > wrote: > > 1. Yes please. > 2. And this time I won't hijack the thread. > > > On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz > wrote: > > Currently we define stream() and parallelStream() on Collection, > with defaults like: > > default Stream stream() { > return Streams.stream( > () -> Streams.spliterator(iterator()__, size(), > Spliterator.SIZED), > Spliterator.SIZED); > } > > In other words, if a Collection does not override stream(), it > gets the stream based on the iterator. > > It has been suggested that we could move stream/parallelStream() > up to Iterable. They could use an almost identical default, > except that they don't know the SIZED flag. (The default in > Collection would stay, so existing inheritors of the Collection > default wouldn't see any difference. (This is why default > methods are virtual.)) > > Several people have asked why not move these to Iterable, since > some APIs return "Iterable" as a least-common-denominator > aggregate type, and this would allow those APIs to participate > in the stream fun. There are also a handful of other types that > implement Iterable, such as Path (Iterable) and > DirectoryStream (where we'd added an entries() method, but that > would just then become stream()). > > The sole downside is that it creates (yet another) external > dependency from java.lang.Iterable -- now to java.util.stream. > > Thoughts? > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > > From kevinb at google.com Thu Feb 21 08:33:09 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 21 Feb 2013 08:33:09 -0800 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> Message-ID: Tardy, but: the Googlers I ran this by all felt just fine with "mapInto". Sure, you can map *multiple, *but that fact just didn't seem overly necessary to force into the name. On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer wrote: > flattenInto gets my vote > On Feb 17, 2013 11:09 AM, "Remi Forax" wrote: > >> On 02/17/2013 08:07 PM, Brian Goetz wrote: >> >>> flattenInto seems the best so far. >>> >> >> +1 >> >> R?mi >> >> >>> On 2/17/2013 9:36 AM, Tim Peierls wrote: >>> >>>> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax >>> > wrote: >>>> >>>> mapAndFlattenInto is a little to verbose for me, mapAndFlat ? >>>> >>>> >>>> No, has to be a verb. >>>> >>>> I'd still understand flattenInto, leaving the mapping part to be implied >>>> by the type name. >>>> >>>> --tim >>>> >>> >> -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Feb 21 08:35:58 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Feb 2013 11:35:58 -0500 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> Message-ID: <51264CEE.5090003@oracle.com> Is mapInto better than flattenInto? Still trivial to change at this point. On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: > Tardy, but: the Googlers I ran this by all felt just fine with > "mapInto". Sure, you can map /multiple, /but that fact just didn't seem > overly necessary to force into the name. > > On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer > wrote: > > flattenInto gets my vote > > On Feb 17, 2013 11:09 AM, "Remi Forax" > wrote: > > On 02/17/2013 08:07 PM, Brian Goetz wrote: > > flattenInto seems the best so far. > > > +1 > > R?mi > > > On 2/17/2013 9:36 AM, Tim Peierls wrote: > > On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax > > >> > wrote: > > mapAndFlattenInto is a little to verbose for me, > mapAndFlat ? > > > No, has to be a verb. > > I'd still understand flattenInto, leaving the mapping > part to be implied > by the type name. > > --tim > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From joe.bowbeer at gmail.com Thu Feb 21 08:38:31 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 21 Feb 2013 08:38:31 -0800 Subject: Iterable.stream() In-Reply-To: <51264AD5.7010009@oracle.com> References: <512640D8.2020500@oracle.com> <51264AD5.7010009@oracle.com> Message-ID: I was reposting Kevin's earlier question and idea. Delimited with "". On Feb 21, 2013 8:27 AM, "Brian Goetz" wrote: > On the other hand, a big argument in favor of this is the simplicity of > building our spliterator() on iterator(). Having stream() have different > behavior than iterator() would be weird. > > The iterator() method might do one of: > A: give you a fresh iterator every time > B: give you the same iterator every time > C: throw > > With the implementation as proposed, the behavior of stream() in these > cases would be: > A: give you a fresh stream every time > B: give you a fresh stream, but which end up sharing the common Iterator > C: throw > > B leads to unpredictable results, but no more nasty than any other case > where B happens. > > (Joe's idea is a good guideline for writing iterator() methods anyway, > maybe we should put that into the doc as a suggestion, asking classes that > don't behave this way to be polite and document their deviant behavior.) > > On 2/21/2013 11:14 AM, Joe Bowbeer wrote: > >> When this question was raised 2 weeks ago, you asked: >> >> "" >> Can we make our best attempt to specify Iterable.stream() better than >> Iterable.iterator() was? >> >> I haven't worked out how to say this yet, but the idea is: >> >> - If at all possible to ensure that each call to stream() returns an >> actual working and independent stream, you really really should do that. >> - If that's just not possible, the second call to stream() really really >> should throw ISE. >> "" >> >> Is this something we should address? There was no discussion about this >> last time. >> >> On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" > > wrote: >> >> 1. Yes please. >> 2. And this time I won't hijack the thread. >> >> >> On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz > > wrote: >> >> Currently we define stream() and parallelStream() on Collection, >> with defaults like: >> >> default Stream stream() { >> return Streams.stream( >> () -> Streams.spliterator(iterator()**__, size(), >> Spliterator.SIZED), >> Spliterator.SIZED); >> } >> >> In other words, if a Collection does not override stream(), it >> gets the stream based on the iterator. >> >> It has been suggested that we could move stream/parallelStream() >> up to Iterable. They could use an almost identical default, >> except that they don't know the SIZED flag. (The default in >> Collection would stay, so existing inheritors of the Collection >> default wouldn't see any difference. (This is why default >> methods are virtual.)) >> >> Several people have asked why not move these to Iterable, since >> some APIs return "Iterable" as a least-common-denominator >> aggregate type, and this would allow those APIs to participate >> in the stream fun. There are also a handful of other types that >> implement Iterable, such as Path (Iterable) and >> DirectoryStream (where we'd added an entries() method, but that >> would just then become stream()). >> >> The sole downside is that it creates (yet another) external >> dependency from java.lang.Iterable -- now to java.util.stream. >> >> Thoughts? >> >> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> >> >> From kevinb at google.com Thu Feb 21 08:42:56 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 21 Feb 2013 08:42:56 -0800 Subject: FlatMapper In-Reply-To: <51264CEE.5090003@oracle.com> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: I believe the mapping aspect is an order of magnitude more relevant than the flattening aspect. The way we've designed the API, nothing is exactly being *flattened*, anyway. It's just that multiple results may be emitted. On Thu, Feb 21, 2013 at 8:35 AM, Brian Goetz wrote: > Is mapInto better than flattenInto? Still trivial to change at this point. > > > On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: > >> Tardy, but: the Googlers I ran this by all felt just fine with >> "mapInto". Sure, you can map /multiple, /but that fact just didn't seem >> >> overly necessary to force into the name. >> >> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer > **> wrote: >> >> flattenInto gets my vote >> >> On Feb 17, 2013 11:09 AM, "Remi Forax" > > wrote: >> >> On 02/17/2013 08:07 PM, Brian Goetz wrote: >> >> flattenInto seems the best so far. >> >> >> +1 >> >> R?mi >> >> >> On 2/17/2013 9:36 AM, Tim Peierls wrote: >> >> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax >> >> >> >> >> wrote: >> >> mapAndFlattenInto is a little to verbose for me, >> mapAndFlat ? >> >> >> No, has to be a verb. >> >> I'd still understand flattenInto, leaving the mapping >> part to be implied >> by the type name. >> >> --tim >> >> >> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> >> > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu Feb 21 08:41:37 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 21 Feb 2013 08:41:37 -0800 Subject: Iterable.stream() In-Reply-To: References: <512640D8.2020500@oracle.com> Message-ID: On Thu, Feb 21, 2013 at 8:14 AM, Joe Bowbeer wrote: > Is this something we should address? There was no discussion about this > last time. > I still think it is. It's true that anyone who inherits the *default *stream() will get one that's only as good as their (possibly lousy) iterator always is, but that's the best we can do. > On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" wrote: > >> 1. Yes please. >> 2. And this time I won't hijack the thread. >> >> >> On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz wrote: >> >>> Currently we define stream() and parallelStream() on Collection, with >>> defaults like: >>> >>> default Stream stream() { >>> return Streams.stream( >>> () -> Streams.spliterator(iterator()**, size(), >>> Spliterator.SIZED), >>> Spliterator.SIZED); >>> } >>> >>> In other words, if a Collection does not override stream(), it gets the >>> stream based on the iterator. >>> >>> It has been suggested that we could move stream/parallelStream() up to >>> Iterable. They could use an almost identical default, except that they >>> don't know the SIZED flag. (The default in Collection would stay, so >>> existing inheritors of the Collection default wouldn't see any difference. >>> (This is why default methods are virtual.)) >>> >>> Several people have asked why not move these to Iterable, since some >>> APIs return "Iterable" as a least-common-denominator aggregate type, and >>> this would allow those APIs to participate in the stream fun. There are >>> also a handful of other types that implement Iterable, such as Path >>> (Iterable) and DirectoryStream (where we'd added an entries() method, >>> but that would just then become stream()). >>> >>> The sole downside is that it creates (yet another) external dependency >>> from java.lang.Iterable -- now to java.util.stream. >>> >>> Thoughts? >>> >>> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com >> > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From Donald.Raab at gs.com Thu Feb 21 08:42:22 2013 From: Donald.Raab at gs.com (Raab, Donald) Date: Thu, 21 Feb 2013 11:42:22 -0500 Subject: FlatMapper In-Reply-To: <51264CEE.5090003@oracle.com> References: <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: <6712820CB52CFB4D842561213A77C05404C3A898A8@GSCMAMP09EX.firmwide.corp.gs.com> Is there anything wrong with flatMapInto? Apologies if this was already covered and dismissed. > -----Original Message----- > From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda- > libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Brian Goetz > Sent: Thursday, February 21, 2013 11:36 AM > To: Kevin Bourrillion > Cc: lambda-libs-spec-experts at openjdk.java.net > Subject: Re: FlatMapper > > Is mapInto better than flattenInto? Still trivial to change at this > point. > > On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: > > Tardy, but: the Googlers I ran this by all felt just fine with > > "mapInto". Sure, you can map /multiple, /but that fact just didn't > > seem overly necessary to force into the name. > > > > On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer > > wrote: > > > > flattenInto gets my vote > > > > On Feb 17, 2013 11:09 AM, "Remi Forax" > > wrote: > > > > On 02/17/2013 08:07 PM, Brian Goetz wrote: > > > > flattenInto seems the best so far. > > > > > > +1 > > > > R?mi > > > > > > On 2/17/2013 9:36 AM, Tim Peierls wrote: > > > > On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax > > > > mlv.fr>>> > > wrote: > > > > mapAndFlattenInto is a little to verbose for me, > > mapAndFlat ? > > > > > > No, has to be a verb. > > > > I'd still understand flattenInto, leaving the mapping > > part to be implied > > by the type name. > > > > --tim > > > > > > > > > > > > -- > > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > > From joe.bowbeer at gmail.com Thu Feb 21 08:42:27 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 21 Feb 2013 08:42:27 -0800 Subject: FlatMapper In-Reply-To: <51264CEE.5090003@oracle.com> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: I prefer flattenInto. On Feb 21, 2013 8:36 AM, "Brian Goetz" wrote: > Is mapInto better than flattenInto? Still trivial to change at this point. > > On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: > >> Tardy, but: the Googlers I ran this by all felt just fine with >> "mapInto". Sure, you can map /multiple, /but that fact just didn't seem >> overly necessary to force into the name. >> >> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer > **> wrote: >> >> flattenInto gets my vote >> >> On Feb 17, 2013 11:09 AM, "Remi Forax" > > wrote: >> >> On 02/17/2013 08:07 PM, Brian Goetz wrote: >> >> flattenInto seems the best so far. >> >> >> +1 >> >> R?mi >> >> >> On 2/17/2013 9:36 AM, Tim Peierls wrote: >> >> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax >> >> >> >> wrote: >> >> mapAndFlattenInto is a little to verbose for me, >> mapAndFlat ? >> >> >> No, has to be a verb. >> >> I'd still understand flattenInto, leaving the mapping >> part to be implied >> by the type name. >> >> --tim >> >> >> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> >> > From kevinb at google.com Thu Feb 21 08:43:36 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 21 Feb 2013 08:43:36 -0800 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: emit()? On Thu, Feb 21, 2013 at 8:42 AM, Kevin Bourrillion wrote: > I believe the mapping aspect is an order of magnitude more relevant than > the flattening aspect. The way we've designed the API, nothing is exactly > being *flattened*, anyway. It's just that multiple results may be > emitted. > > > On Thu, Feb 21, 2013 at 8:35 AM, Brian Goetz wrote: > >> Is mapInto better than flattenInto? Still trivial to change at this >> point. >> >> >> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: >> >>> Tardy, but: the Googlers I ran this by all felt just fine with >>> "mapInto". Sure, you can map /multiple, /but that fact just didn't seem >>> >>> overly necessary to force into the name. >>> >>> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer >> **> wrote: >>> >>> flattenInto gets my vote >>> >>> On Feb 17, 2013 11:09 AM, "Remi Forax" >> > wrote: >>> >>> On 02/17/2013 08:07 PM, Brian Goetz wrote: >>> >>> flattenInto seems the best so far. >>> >>> >>> +1 >>> >>> R?mi >>> >>> >>> On 2/17/2013 9:36 AM, Tim Peierls wrote: >>> >>> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax >>> >>> >> >>> >>> wrote: >>> >>> mapAndFlattenInto is a little to verbose for me, >>> mapAndFlat ? >>> >>> >>> No, has to be a verb. >>> >>> I'd still understand flattenInto, leaving the mapping >>> part to be implied >>> by the type name. >>> >>> --tim >>> >>> >>> >>> >>> >>> -- >>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >>> >>> >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From tim at peierls.net Thu Feb 21 08:46:34 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 21 Feb 2013 11:46:34 -0500 Subject: FlatMapper In-Reply-To: <51264CEE.5090003@oracle.com> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: Yes, mapInto is better than flattenInto. On Thu, Feb 21, 2013 at 11:35 AM, Brian Goetz wrote: > Is mapInto better than flattenInto? Still trivial to change at this point. > > > On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: > >> Tardy, but: the Googlers I ran this by all felt just fine with >> "mapInto". Sure, you can map /multiple, /but that fact just didn't seem >> >> overly necessary to force into the name. >> >> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer > **> wrote: >> >> flattenInto gets my vote >> >> On Feb 17, 2013 11:09 AM, "Remi Forax" > > wrote: >> >> On 02/17/2013 08:07 PM, Brian Goetz wrote: >> >> flattenInto seems the best so far. >> >> >> +1 >> >> R?mi >> >> >> On 2/17/2013 9:36 AM, Tim Peierls wrote: >> >> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax >> >> >> >> >> wrote: >> >> mapAndFlattenInto is a little to verbose for me, >> mapAndFlat ? >> >> >> No, has to be a verb. >> >> I'd still understand flattenInto, leaving the mapping >> part to be implied >> by the type name. >> >> --tim >> >> >> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> >> > From joe.bowbeer at gmail.com Thu Feb 21 08:47:35 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 21 Feb 2013 08:47:35 -0800 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: Let's go back to mapAndFlattenInto and try this exercise again! Last time we ended up at flattenInto, but maybe we took a wrong turn near the start? On Feb 21, 2013 8:43 AM, "Kevin Bourrillion" wrote: > emit()? > > > On Thu, Feb 21, 2013 at 8:42 AM, Kevin Bourrillion wrote: > >> I believe the mapping aspect is an order of magnitude more relevant than >> the flattening aspect. The way we've designed the API, nothing is exactly >> being *flattened*, anyway. It's just that multiple results may be >> emitted. >> >> >> On Thu, Feb 21, 2013 at 8:35 AM, Brian Goetz wrote: >> >>> Is mapInto better than flattenInto? Still trivial to change at this >>> point. >>> >>> >>> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: >>> >>>> Tardy, but: the Googlers I ran this by all felt just fine with >>>> "mapInto". Sure, you can map /multiple, /but that fact just didn't seem >>>> >>>> overly necessary to force into the name. >>>> >>>> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer >>> **> wrote: >>>> >>>> flattenInto gets my vote >>>> >>>> On Feb 17, 2013 11:09 AM, "Remi Forax" >>> > wrote: >>>> >>>> On 02/17/2013 08:07 PM, Brian Goetz wrote: >>>> >>>> flattenInto seems the best so far. >>>> >>>> >>>> +1 >>>> >>>> R?mi >>>> >>>> >>>> On 2/17/2013 9:36 AM, Tim Peierls wrote: >>>> >>>> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax >>>> >>>> >> >>>> >>>> wrote: >>>> >>>> mapAndFlattenInto is a little to verbose for me, >>>> mapAndFlat ? >>>> >>>> >>>> No, has to be a verb. >>>> >>>> I'd still understand flattenInto, leaving the mapping >>>> part to be implied >>>> by the type name. >>>> >>>> --tim >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >>>> >>>> >>> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com >> > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > From spullara at gmail.com Thu Feb 21 08:56:35 2013 From: spullara at gmail.com (Sam Pullara) Date: Thu, 21 Feb 2013 08:56:35 -0800 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: I like mapInto as well. Sam On Feb 21, 2013, at 8:46 AM, Tim Peierls wrote: > Yes, mapInto is better than flattenInto. > > On Thu, Feb 21, 2013 at 11:35 AM, Brian Goetz wrote: > Is mapInto better than flattenInto? Still trivial to change at this point. > > > On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: > Tardy, but: the Googlers I ran this by all felt just fine with > "mapInto". Sure, you can map /multiple, /but that fact just didn't seem > > overly necessary to force into the name. > > On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer > wrote: > > flattenInto gets my vote > > On Feb 17, 2013 11:09 AM, "Remi Forax" > wrote: > > On 02/17/2013 08:07 PM, Brian Goetz wrote: > > flattenInto seems the best so far. > > > +1 > > R?mi > > > On 2/17/2013 9:36 AM, Tim Peierls wrote: > > On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax > > >> > > wrote: > > mapAndFlattenInto is a little to verbose for me, > mapAndFlat ? > > > No, has to be a verb. > > I'd still understand flattenInto, leaving the mapping > part to be implied > by the type name. > > --tim > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > > From joe.bowbeer at gmail.com Thu Feb 21 09:18:01 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 21 Feb 2013 09:18:01 -0800 Subject: FlatMapper In-Reply-To: <51264CEE.5090003@oracle.com> References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: Last time we were looking for a descriptive name not necessarily a great name. flattenInto does a good job of referencing its interface, while mapInto is more ambiguous in that respect. Is mapInto more easily confused with other names such as 'collect'? Is mapInto better than flattenInto? Still trivial to change at this point. On 2/21/2013 11:33 AM, Kevin Bourrillion wrote: > Tardy, but: the Googlers I ran this by all felt just fine with > "mapInto". Sure, you can map /multiple, /but that fact just didn't seem > overly necessary to force into the name. > > On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer **> wrote: > > flattenInto gets my vote > > On Feb 17, 2013 11:09 AM, "Remi Forax" > wrote: > > On 02/17/2013 08:07 PM, Brian Goetz wrote: > > flattenInto seems the best so far. > > > +1 > > R?mi > > > On 2/17/2013 9:36 AM, Tim Peierls wrote: > > On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax > > >> > wrote: > > mapAndFlattenInto is a little to verbose for me, > mapAndFlat ? > > > No, has to be a verb. > > I'd still understand flattenInto, leaving the mapping > part to be implied > by the type name. > > --tim > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > > From tim at peierls.net Thu Feb 21 09:39:27 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 21 Feb 2013 12:39:27 -0500 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: On Thu, Feb 21, 2013 at 12:18 PM, Joe Bowbeer wrote: > Is mapInto more easily confused with other names such as 'collect'? > No, I don't think so. As Kevin pointed out, there's not enough new going on here to deserve a fancy new (scary) name. We're taking a stream of T values and *map*ping each one to zero or more U instances and putting them *into* a U consumer. In the general case, it's not really flattening (or exploding), even though what I would describe in those terms is a special case of this. FlatMapper is a *synecdoche*, a more specific term standing in for a more general concept, and if it makes Scala devotees happy, then I guess it does no harm. But (and I know it must seem like I'm reversing myself, since I suggested "flattenInto") I don't see a need to repeat the favor in the method name. --tim From dl at cs.oswego.edu Thu Feb 21 09:44:43 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 21 Feb 2013 12:44:43 -0500 Subject: FlatMapper In-Reply-To: References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com> <511E82F8.1060509@oracle.com> <5120BC32.3000309@univ-mlv.fr> <51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr> <51264CEE.5090003@oracle.com> Message-ID: <51265D0B.6010501@cs.oswego.edu> On 02/21/13 12:39, Tim Peierls wrote: > FlatMapper is a /synecdoche/, a more specific term standing in for a more > general concept, It's always a great day when you can use "synecdoche"! (Not only because it fits, but the imagery of plays within movies within ...) -Doug From brian.goetz at oracle.com Thu Feb 21 11:17:58 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Feb 2013 14:17:58 -0500 Subject: Code review request Message-ID: <512672E6.1050708@oracle.com> At http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/ I've posted a webrev for about half the classes in java.util.stream. None of these are public classes, so there are no public API issues here, but plenty of internal API issues, naming issues (ooh, a bikeshed), and code quality issues. From forax at univ-mlv.fr Thu Feb 21 11:33:37 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 21 Feb 2013 20:33:37 +0100 Subject: Iterable.stream() In-Reply-To: References: <512640D8.2020500@oracle.com> Message-ID: <51267691.7000206@univ-mlv.fr> On 02/21/2013 05:41 PM, Kevin Bourrillion wrote: > On Thu, Feb 21, 2013 at 8:14 AM, Joe Bowbeer > wrote: > > Is this something we should address? There was no discussion > about this last time. > > I still think it is. It's true that anyone who inherits the /default > /stream() will get one that's only as good as their (possibly lousy) > iterator always is, but that's the best we can do. We provide a way to get a Spliterator from an Iterator and a Stream from a Spliterator, so we already provide a way to get a Stream from an Iterable but with no way to get a better stream if the Iterable is a Collection. so Iterable should have a method stream(), default methods are virtual exactly for that case. R?mi > > On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" > wrote: > > 1. Yes please. > 2. And this time I won't hijack the thread. > > > On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz > > wrote: > > Currently we define stream() and parallelStream() on > Collection, with defaults like: > > default Stream stream() { > return Streams.stream( > () -> Streams.spliterator(iterator(), size(), > Spliterator.SIZED), > Spliterator.SIZED); > } > > In other words, if a Collection does not override > stream(), it gets the stream based on the iterator. > > It has been suggested that we could move > stream/parallelStream() up to Iterable. They could use an > almost identical default, except that they don't know the > SIZED flag. (The default in Collection would stay, so > existing inheritors of the Collection default wouldn't see > any difference. (This is why default methods are virtual.)) > > Several people have asked why not move these to Iterable, > since some APIs return "Iterable" as a > least-common-denominator aggregate type, and this would > allow those APIs to participate in the stream fun. There > are also a handful of other types that implement Iterable, > such as Path (Iterable) and DirectoryStream (where > we'd added an entries() method, but that would just then > become stream()). > > The sole downside is that it creates (yet another) > external dependency from java.lang.Iterable -- now to > java.util.stream. > > Thoughts? > > > > > -- > Kevin Bourrillion | Java Librarian | Google, > Inc. |kevinb at google.com > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From brian.goetz at oracle.com Thu Feb 21 15:01:30 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Feb 2013 18:01:30 -0500 Subject: Collectors inventory Message-ID: <5126A74A.3040509@oracle.com> As I promised a long time ago, here's an overview of what's in Collectors currently. There are 12 basic forms: - toCollection(ctor) - toList() - toSet() - toStringBuilder() - toStringJoiner(delimiter) - to{Long,Double}Statistics - groupingBy(classifier, mapFactory, downstream collector) - groupingReduce(classifier, mapFactory, mapper, reducer) - mapping(mappingFn, downstream collector) - joiningWith(mappingFunction, mergeFunction, mapFactory) - partitioningBy(predicate, downstream collector) - partitioningReduce(predicate, mapper, reducer) The toXxx forms should be obvious. Mapping has four versions, analogous to Stream.map: - mapping(T -> U, Collector) - mapping(T -> int, Collector.OfInt) - mapping(T -> long, Collector.OfLong) - mapping(T -> double, Collector.OfDouble) GroupingBy has four forms: - groupingBy(T->K) -- standard groupBy, values of resulting Map are Collection - Same, but with explicit constructors for map and for rows (so you can produce, say, a TreeMap> and not just a Map>) - groupingBy(T->K, Collector) -- multi-level groupBy, where downstream is another Collector - Same, but with explicit ctor for map GroupingReduce has four forms: - groupingReduce(T->K, BinaryOperator) // simple reduce - groupingReduce(T->K, Function, BinaryOperator) // map-reduce - above two with explicit map ctors JoiningWith has four forms: - joiningWith(T->U) - same, but with explicit Map ctor - same, but with merge function for handling duplicates - same, with both explicit map ctor and merge function PartitioningBy has three forms: - partitioningBy(Predicate) - Same, but with explicit constructor for Collection (so you can get a Map>) - partitioningBy(Predicate, Collector) // multi-level PartitioningReduce has two forms: - predicate + reducer - predicate + mapper + reducer Impl note: in any category, all but one are one-liners that delegate to the general form. Plus, all the Map-bearing ones have a concurrent and non-concurrent version. From kevinb at google.com Fri Feb 22 08:06:17 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 22 Feb 2013 08:06:17 -0800 Subject: A few very minor library issues Message-ID: Just a few little things. 1. I feel the Stream methods findFirst() and findAny() can really be named just first() and any(). The "find" is just odd and doesn't do enough. Failing that, I'd go for firstElement() / anyElement(). 2. I like Stream.substream(), but Stream.sub*S*tream() is undeniably consistent with the collections API (subSet, etc.; sure, String.substring() doesn't follow that, but it's "farther away"). I'm actually on the fence here, because I think "substream" is strictly the *correct* way to camel-case the word "substream"... 3. Are we concerned that the name Map.computeIfAbsent() obscures what the mutative effect on the map is? -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Feb 22 08:25:41 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 22 Feb 2013 11:25:41 -0500 Subject: A few very minor library issues In-Reply-To: References: Message-ID: <51279C05.1090007@oracle.com> > 1. I feel the Stream methods findFirst() and findAny() can really be > named just first() and any(). The "find" is just odd and doesn't do > enough. Failing that, I'd go for firstElement() / anyElement(). Agree find is a little weird. I am fine with first() but a little squeamish about any(), just because people who have not yet been through the parallelism meat grinder already find "findAny" weird ("why is it different from findFirst?") Also OK with firstElement() and anyElement(). > 2. I like Stream.substream(), but Stream.sub*S*tream() is undeniably > consistent with the collections API (subSet, etc.; sure, > String.substring() doesn't follow that, but it's "farther away"). I'm > actually on the fence here, because I think "substream" is strictly the > /correct/ way to camel-case the word "substream"... No strong opinion here. What do people want? From joe.bowbeer at gmail.com Fri Feb 22 08:56:28 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Fri, 22 Feb 2013 08:56:28 -0800 Subject: A few very minor library issues In-Reply-To: References: Message-ID: I'm fine with the find* methods as they are. It wasn't a problem finding them and using them in the examples I wrote. The common prefix is a help for grouping these common methods, and these both return an Option thing, so the common prefix is also a helpful reminder there. Just so you know, after we have discussed names several times over several months and I have already coded the choices into examples, I tend to feel pretty good about the names and am reluctant to want to change them:-) I like substream, too. I'm OK with computeIfAbsent. After years of discussion, it is what it is. On Feb 22, 2013 8:06 AM, "Kevin Bourrillion" wrote: > Just a few little things. > > 1. I feel the Stream methods findFirst() and findAny() can really be named > just first() and any(). The "find" is just odd and doesn't do enough. > Failing that, I'd go for firstElement() / anyElement(). > > 2. I like Stream.substream(), but Stream.sub*S*tream() is undeniably > consistent with the collections API (subSet, etc.; sure, String.substring() > doesn't follow that, but it's "farther away"). I'm actually on the fence > here, because I think "substream" is strictly the *correct* way to > camel-case the word "substream"... > > 3. Are we concerned that the name Map.computeIfAbsent() obscures what the > mutative effect on the map is? > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > From kevinb at google.com Fri Feb 22 08:59:10 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 22 Feb 2013 08:59:10 -0800 Subject: A few very minor library issues In-Reply-To: <51279C05.1090007@oracle.com> References: <51279C05.1090007@oracle.com> Message-ID: On Fri, Feb 22, 2013 at 8:25 AM, Brian Goetz wrote: 1. I feel the Stream methods findFirst() and findAny() can really be >> named just first() and any(). The "find" is just odd and doesn't do >> enough. Failing that, I'd go for firstElement() / anyElement(). >> > > Agree find is a little weird. I am fine with first() but a little > squeamish about any(), just because people who have not yet been through > the parallelism meat grinder already find "findAny" weird ("why is it > different from findFirst?") > That seems like a concern that's roughly the same whether they have a common prefix or suffix or not. Though I do see the minor point about a common prefix grouping them together so that you at least have to ponder the difference up front... > Also OK with firstElement() and anyElement(). > > 2. I like Stream.substream(), but Stream.sub*S*tream() is undeniably >> >> consistent with the collections API (subSet, etc.; sure, >> String.substring() doesn't follow that, but it's "farther away"). I'm >> actually on the fence here, because I think "substream" is strictly the >> /correct/ way to camel-case the word "substream"... >> > > No strong opinion here. What do people want? > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Feb 22 15:32:14 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 22 Feb 2013 18:32:14 -0500 Subject: Initial spec review for Stream Message-ID: <5127FFFE.5010407@oracle.com> I've put up some very rough proto-spec for Stream and the stream package-info at: http://cr.openjdk.java.net/~briangoetz/JDK-8008682/doc/. (I've included the whole package but am only requesting comments on these two files for now, as the rest are incomplete.) There's definitely lots of stuff missing, including: - Describe the difference between sequential and parallel streams - More general information about reduce, better definitions for associativity, more description of how reduce employs parallelism, more examples - Role of stream flags in various operations, specifically ordering - Non-interference and constraints on lambda characteristics (e.g., side-effect-freedom) - collectUnordered But its a start. Comments please! From forax at univ-mlv.fr Sat Feb 23 01:40:37 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 23 Feb 2013 10:40:37 +0100 Subject: Collectors inventory In-Reply-To: <5126A74A.3040509@oracle.com> References: <5126A74A.3040509@oracle.com> Message-ID: <51288E95.4010903@univ-mlv.fr> On 02/22/2013 12:01 AM, Brian Goetz wrote: > As I promised a long time ago, here's an overview of what's in > Collectors currently. I think there are too many methods in Collectors, we should restrain ourselves to 2 forms (3 max). > > There are 12 basic forms: > - toCollection(ctor) > - toList() > - toSet() > - toStringBuilder() > - toStringJoiner(delimiter) > - to{Long,Double}Statistics > > - groupingBy(classifier, mapFactory, downstream collector) > - groupingReduce(classifier, mapFactory, mapper, reducer) > - mapping(mappingFn, downstream collector) > - joiningWith(mappingFunction, mergeFunction, mapFactory) > - partitioningBy(predicate, downstream collector) > - partitioningReduce(predicate, mapper, reducer) > > The toXxx forms should be obvious. > > Mapping has four versions, analogous to Stream.map: > - mapping(T -> U, Collector) > - mapping(T -> int, Collector.OfInt) > - mapping(T -> long, Collector.OfLong) > - mapping(T -> double, Collector.OfDouble) > > GroupingBy has four forms: > - groupingBy(T->K) -- standard groupBy, values of resulting Map are > Collection > - Same, but with explicit constructors for map and for rows (so you > can produce, say, a TreeMap> and not just a > Map>) > - groupingBy(T->K, Collector) -- multi-level groupBy, where > downstream is another Collector > - Same, but with explicit ctor for map You can remove the third one give, you have the one with an explicit constructor. > > GroupingReduce has four forms: > - groupingReduce(T->K, BinaryOperator) // simple reduce > - groupingReduce(T->K, Function, BinaryOperator) // map-reduce > - above two with explicit map ctors keep only the ones with explicit constructors. > > JoiningWith has four forms: > - joiningWith(T->U) > - same, but with explicit Map ctor > - same, but with merge function for handling duplicates > - same, with both explicit map ctor and merge function remove the third one. > > PartitioningBy has three forms: > - partitioningBy(Predicate) > - Same, but with explicit constructor for Collection (so you can get > a Map>) > - partitioningBy(Predicate, Collector) // multi-level > > PartitioningReduce has two forms: > - predicate + reducer > - predicate + mapper + reducer > > Impl note: in any category, all but one are one-liners that delegate > to the general form. R?mi From forax at univ-mlv.fr Sat Feb 23 02:51:41 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 23 Feb 2013 11:51:41 +0100 Subject: Code review request In-Reply-To: <512672E6.1050708@oracle.com> References: <512672E6.1050708@oracle.com> Message-ID: <51289F3D.1010609@univ-mlv.fr> On 02/21/2013 08:17 PM, Brian Goetz wrote: > At > http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/ > > I've posted a webrev for about half the classes in java.util.stream. > None of these are public classes, so there are no public API issues > here, but plenty of internal API issues, naming issues (ooh, a > bikeshed), and code quality issues. > Hi Brian, All protected fields should not be protected but package visible. Classes are package private so there is no need to use a modifier which offer a wider visibility. The same is true for constructors. For default method, some of them are marked public, some of them are not, what the coding convention said ? Code convention again, there is a lot of if/else with no curly braces, or only curly braces on the if part but not on the else part. Also, when a if block ends with a return, there is no need to use 'else', if (result != null) { foundResult(result); return result; } else return null; can be simply written: if (result != null) { foundResult(result); return result; } return null; All inner class should not have private constructors, like by example FindOp.FindTask, because the compiler will have to generate a special accessor for them when they are called from the outer class. In AbstractShortCircuitTask: It's not clear that cancel and sharedResult can be accessed directly given that they both have methods that acts as getter and setter. If they can be accessed directly, I think it's better to declare them private and to use getters. Depending on the ops, some of them do nullcheks of arguments at creating time (ForEachOp) , some of them don't (FindOp). In ForEachUntilOp, the 'consumer' is checked but 'until' is not. in ForEachOp, most of the keyword protected are not needed, ForEachUntilOp which inherits from ForEachOp is in the same package. In ForEachUntilOp, the constructor should be private (like for all the other ops). In MatchOp, line 110, I think the compiler bug is fixed now ? The enum MatchKind should not be public and all constructor of all inner classes should not be private. In OpsUtils, some static methods are public and some are not, in Tripwire: enabled should be in uppercase (ENABLED). method trip() should be: public static void trip(Class trippingClass, String msg) cheers, R?mi From tim at peierls.net Sat Feb 23 09:06:05 2013 From: tim at peierls.net (Tim Peierls) Date: Sat, 23 Feb 2013 12:06:05 -0500 Subject: Code review request In-Reply-To: <512672E6.1050708@oracle.com> References: <512672E6.1050708@oracle.com> Message-ID: On Thu, Feb 21, 2013 at 2:17 PM, Brian Goetz wrote: > At > http://cr.openjdk.java.net/~**briangoetz/jdk-8008670/webrev/ > > I've posted a webrev for about half the classes in java.util.stream. None > of these are public classes, so there are no public API issues here, but > plenty of internal API issues, naming issues (ooh, a bikeshed), and code > quality issues. > Things I noticed before I ran out of steam: In AbstractTask the use of multicharacter type parameters is confusing, especially with an underscore. AbstractTask, , or even would be better. BiBlock -> BiConsumer in Map.java comments. --tim From joe.bowbeer at gmail.com Sat Feb 23 11:42:30 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 23 Feb 2013 11:42:30 -0800 Subject: Code review request In-Reply-To: <512672E6.1050708@oracle.com> References: <512672E6.1050708@oracle.com> Message-ID: We should send these comments in emails? I don't see a way to comment at the link provided. I repeat some of Remi's comments regarding formatting below. File: http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/src/share/classes/java/util/Map.java.patch 1. Please run this through a code formatter to conform with Oracle's standard. Things to fix: parameter wrapping should indent only 8 spaces: + default V merge(K key, V value, + BiFunction remappingFunction) { if-else brace should be on same line: + } + else if ((newValue = remappingFunction.apply(oldValue, value)) != null) { multi-line 'if' always needs braces? + if (replace(key, oldValue, newValue)) + return newValue; 2. replaceAll javadoc: Function#map => Function#apply calling the function's {@code Function#map} method => calling the function's {@code Function#apply} method 3. replaceAll question What's with all the finals? + final Iterator> entries = entrySet().iterator(); + while (entries.hasNext()) { + final Map.Entry entry = entries.next(); + entry.setValue(function.apply(entry.getKey(), entry.getValue())); + } Why not code this as follows, just like forEach? + for (Map.Entry entry : entrySet()) { + entry.setValue(function.apply(entry.getKey(), entry.getValue())); + } --Joe On Thu, Feb 21, 2013 at 11:17 AM, Brian Goetz wrote: > At > http://cr.openjdk.java.net/~**briangoetz/jdk-8008670/webrev/ > > I've posted a webrev for about half the classes in java.util.stream. None > of these are public classes, so there are no public API issues here, but > plenty of internal API issues, naming issues (ooh, a bikeshed), and code > quality issues. > > From joe.bowbeer at gmail.com Sun Feb 24 13:09:45 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sun, 24 Feb 2013 13:09:45 -0800 Subject: Code review request In-Reply-To: References: <512672E6.1050708@oracle.com> Message-ID: A few more comments. 1. General: The method descriptions should be written 3rd person declarative, according to Oracle's style guide http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#styleguide This is not followed in many places. For example: Get the {@code StreamShape} describing the input shape of the pipeline => Gets the {@code StreamShape} describing the input shape of the pipeline. 2. Typo (missing space) in PipelineHelper javadoc: 40 * the last intermediate operation described by this {@code PipelineHelper}.The 3. StreamShape enum is missing its per-element javadoc --Joe On Sat, Feb 23, 2013 at 11:42 AM, Joe Bowbeer wrote: > We should send these comments in emails? I don't see a way to comment at > the link provided. > > I repeat some of Remi's comments regarding formatting below. > > File: > > > http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/src/share/classes/java/util/Map.java.patch > > 1. Please run this through a code formatter to conform with Oracle's > standard. Things to fix: > > parameter wrapping should indent only 8 spaces: > > + default V merge(K key, V value, > + BiFunction > remappingFunction) { > > if-else brace should be on same line: > > + } > + else if ((newValue = remappingFunction.apply(oldValue, value)) != null) { > > multi-line 'if' always needs braces? > > + if (replace(key, oldValue, newValue)) > + return newValue; > > > 2. replaceAll javadoc: Function#map => Function#apply > > calling the function's {@code Function#map} method > > => > calling the function's {@code Function#apply} method > > > 3. replaceAll question > > What's with all the finals? > > + final Iterator> entries = entrySet().iterator(); > + while (entries.hasNext()) { > + final Map.Entry entry = entries.next(); > + entry.setValue(function.apply(entry.getKey(), > entry.getValue())); > + } > > Why not code this as follows, just like forEach? > > + for (Map.Entry entry : entrySet()) { > + entry.setValue(function.apply(entry.getKey(), > entry.getValue())); > + } > > --Joe > > > On Thu, Feb 21, 2013 at 11:17 AM, Brian Goetz wrote: > >> At >> http://cr.openjdk.java.net/~**briangoetz/jdk-8008670/webrev/ >> >> I've posted a webrev for about half the classes in java.util.stream. None >> of these are public classes, so there are no public API issues here, but >> plenty of internal API issues, naming issues (ooh, a bikeshed), and code >> quality issues. >> >> > From david.holmes at oracle.com Sun Feb 24 19:07:48 2013 From: david.holmes at oracle.com (David Holmes) Date: Mon, 25 Feb 2013 13:07:48 +1000 Subject: Code review request In-Reply-To: <51289F3D.1010609@univ-mlv.fr> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> Message-ID: <512AD584.2080902@oracle.com> On 23/02/2013 8:51 PM, Remi Forax wrote: > On 02/21/2013 08:17 PM, Brian Goetz wrote: >> At >> http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/ >> >> I've posted a webrev for about half the classes in java.util.stream. >> None of these are public classes, so there are no public API issues >> here, but plenty of internal API issues, naming issues (ooh, a >> bikeshed), and code quality issues. >> > > Hi Brian, > > All protected fields should not be protected but package visible. > Classes are package private so there is no need to use a modifier which > offer a wider visibility. > The same is true for constructors. I believe some of these may end up being public (TBD), in which case better to define member accessibility as if they were already public as it greatly simplifies the changes needed later. David ----- > For default method, some of them are marked public, some of them are not, > what the coding convention said ? > > Code convention again, there is a lot of if/else with no curly braces, > or only curly braces > on the if part but not on the else part. > Also, when a if block ends with a return, there is no need to use 'else', > > if (result != null) { > foundResult(result); > return result; > } > else > return null; > > can be simply written: > > if (result != null) { > foundResult(result); > return result; > } > return null; > > > All inner class should not have private constructors, like by example > FindOp.FindTask, because > the compiler will have to generate a special accessor for them when they > are called from > the outer class. > > In AbstractShortCircuitTask: > It's not clear that cancel and sharedResult can be accessed directly > given that they both have methods that acts as getter and setter. > If they can be accessed directly, I think it's better to declare them > private and to use getters. > > Depending on the ops, some of them do nullcheks of arguments at creating > time (ForEachOp) , some of them don't (FindOp). > In ForEachUntilOp, the 'consumer' is checked but 'until' is not. > > in ForEachOp, most of the keyword protected are not needed, > ForEachUntilOp which inherits from ForEachOp is in the same package. > > In ForEachUntilOp, the constructor should be private (like for all the > other ops). > > In MatchOp, line 110, I think the compiler bug is fixed now ? > The enum MatchKind should not be public and all constructor of all inner > classes should not be private. > > In OpsUtils, some static methods are public and some are not, > > in Tripwire: > enabled should be in uppercase (ENABLED). > method trip() should be: > public static void trip(Class trippingClass, String msg) > > cheers, > R?mi > From forax at univ-mlv.fr Mon Feb 25 01:03:24 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 25 Feb 2013 10:03:24 +0100 Subject: Code review request In-Reply-To: <512AD584.2080902@oracle.com> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> <512AD584.2080902@oracle.com> Message-ID: <512B28DC.7050101@univ-mlv.fr> On 02/25/2013 04:07 AM, David Holmes wrote: > On 23/02/2013 8:51 PM, Remi Forax wrote: >> On 02/21/2013 08:17 PM, Brian Goetz wrote: >>> At >>> http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/ >>> >>> I've posted a webrev for about half the classes in java.util.stream. >>> None of these are public classes, so there are no public API issues >>> here, but plenty of internal API issues, naming issues (ooh, a >>> bikeshed), and code quality issues. >>> >> >> Hi Brian, >> >> All protected fields should not be protected but package visible. >> Classes are package private so there is no need to use a modifier which >> offer a wider visibility. >> The same is true for constructors. > > I believe some of these may end up being public (TBD), in which case > better to define member accessibility as if they were already public > as it greatly simplifies the changes needed later. > > David > ----- Given that the release of jdk9 is at least two years from now, this API will change, one will come with a GPU pipeline (Sumatra?) or with a flattened bytecode pipeline (my pet project), so trying to figure out now what should be public or not is like predicting the future in a crystal ball. I think it's better to let all members package private and see later. BTW, I have no problem with protected methods, my main concern is protected fields or protected inner classes. R?mi > > >> For default method, some of them are marked public, some of them are >> not, >> what the coding convention said ? >> >> Code convention again, there is a lot of if/else with no curly braces, >> or only curly braces >> on the if part but not on the else part. >> Also, when a if block ends with a return, there is no need to use >> 'else', >> >> if (result != null) { >> foundResult(result); >> return result; >> } >> else >> return null; >> >> can be simply written: >> >> if (result != null) { >> foundResult(result); >> return result; >> } >> return null; >> >> >> All inner class should not have private constructors, like by example >> FindOp.FindTask, because >> the compiler will have to generate a special accessor for them when they >> are called from >> the outer class. >> >> In AbstractShortCircuitTask: >> It's not clear that cancel and sharedResult can be accessed directly >> given that they both have methods that acts as getter and setter. >> If they can be accessed directly, I think it's better to declare them >> private and to use getters. >> >> Depending on the ops, some of them do nullcheks of arguments at creating >> time (ForEachOp) , some of them don't (FindOp). >> In ForEachUntilOp, the 'consumer' is checked but 'until' is not. >> >> in ForEachOp, most of the keyword protected are not needed, >> ForEachUntilOp which inherits from ForEachOp is in the same package. >> >> In ForEachUntilOp, the constructor should be private (like for all the >> other ops). >> >> In MatchOp, line 110, I think the compiler bug is fixed now ? >> The enum MatchKind should not be public and all constructor of all inner >> classes should not be private. >> >> In OpsUtils, some static methods are public and some are not, >> >> in Tripwire: >> enabled should be in uppercase (ENABLED). >> method trip() should be: >> public static void trip(Class trippingClass, String msg) >> >> cheers, >> R?mi >> From paul.sandoz at oracle.com Mon Feb 25 09:31:32 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 25 Feb 2013 18:31:32 +0100 Subject: Code review request In-Reply-To: <51289F3D.1010609@univ-mlv.fr> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> Message-ID: <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> Hi Remi, Thanks for the feedback i have addressed some of this, mostly related to inner classes, in following change set to the lambda repo: http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea We can update the webrev next week. On Feb 23, 2013, at 11:51 AM, Remi Forax wrote: > On 02/21/2013 08:17 PM, Brian Goetz wrote: >> At >> http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/ >> >> I've posted a webrev for about half the classes in java.util.stream. None of these are public classes, so there are no public API issues here, but plenty of internal API issues, naming issues (ooh, a bikeshed), and code quality issues. >> > > Hi Brian, > > All protected fields should not be protected but package visible. > Classes are package private so there is no need to use a modifier which offer a wider visibility. > The same is true for constructors. > I agree with this, if there are no further objections i will fix in the lambda repo towards the end of the week. > For default method, some of them are marked public, some of them are not, > what the coding convention said ? > AFAICT "public" was only on two such default methods, so i have removed that modifier. > Code convention again, there is a lot of if/else with no curly braces, or only curly braces > on the if part but not on the else part. > Also, when a if block ends with a return, there is no need to use 'else', > > if (result != null) { > foundResult(result); > return result; > } > else > return null; > > can be simply written: > > if (result != null) { > foundResult(result); > return result; > } > return null; > Regarding code conventions i would prefer to auto-format all code to ensure consistency, as to what that consistency is, well we could argue until heat death of the universe :-) I am fine as long as it is consistent and easy to hit Alt-Cmd-L or what ever it is in ones favourite IDE. > > All inner class should not have private constructors, like by example FindOp.FindTask, because > the compiler will have to generate a special accessor for them when they are called from > the outer class. > I have made changes to all inner classes to conform to this. I have also marked all classes as final where appropriate. > In AbstractShortCircuitTask: > It's not clear that cancel and sharedResult can be accessed directly given that they both have methods that acts as getter and setter. > If they can be accessed directly, I think it's better to declare them private and to use getters. > They should be private, they are not accessed outside of that class. I will fix. > Depending on the ops, some of them do nullcheks of arguments at creating time (ForEachOp) , some of them don't (FindOp). > In ForEachUntilOp, the 'consumer' is checked but 'until' is not. > OK, there are probably lots of missing null checks in the code... > in ForEachOp, most of the keyword protected are not needed, ForEachUntilOp which inherits from ForEachOp is in the same package. > > In ForEachUntilOp, the constructor should be private (like for all the other ops). > Done. > In MatchOp, line 110, I think the compiler bug is fixed now ? Not yet, i can still reproduce it. > The enum MatchKind should not be public and all constructor of all inner classes should not be private. > Done. > In OpsUtils, some static methods are public and some are not, > OpUtils is now gone in the lambda repo. The forEach and reduce functionality is moved into the corresponding op classes. The static method has been moved to a default method on PipelineHelper. > in Tripwire: > enabled should be in uppercase (ENABLED). > method trip() should be: > public static void trip(Class trippingClass, String msg) > Done. I also made the field and method package private. Thanks, Paul. From joe.bowbeer at gmail.com Mon Feb 25 12:46:15 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 25 Feb 2013 12:46:15 -0800 Subject: Code review request In-Reply-To: <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> Message-ID: On Feb 25, 2013 9:31 AM, "Paul Sandoz" wrote: > > Hi Remi, > > Thanks for the feedback i have addressed some of this, mostly related to inner classes, in following change set to the lambda repo: > > http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea > > We can update the webrev next week. > > > On Feb 23, 2013, at 11:51 AM, Remi Forax wrote: > >> On 02/21/2013 08:17 PM, Brian Goetz wrote: >>> >>> At >>> http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/ >>> >>> I've posted a webrev for about half the classes in java.util.stream. None of these are public classes, so there are no public API issues here, but plenty of internal API issues, naming issues (ooh, a bikeshed), and code quality issues. >>> >> >> Hi Brian, >> >> Also, when a if block ends with a return, there is no need to use 'else', >> >> if (result != null) { >> foundResult(result); >> return result; >> } >> else >> return null; >> >> can be simply written: >> >> if (result != null) { >> foundResult(result); >> return result; >> } >> return null; >> > > Regarding code conventions i would prefer to auto-format all code to ensure consistency, as to what that consistency is, well we could argue until heat death of the universe :-) I am fine as long as it is consistent and easy to hit Alt-Cmd-L or what ever it is in ones favourite IDE. > The omission of else after a return is a refinement that is not covered in any style guide that I am aware of. However, I think most everything else is covered by these: http://www.oracle.com/technetwork/java/codeconv-138413.html Alt-Shift-F ;-) Joe From david.holmes at oracle.com Mon Feb 25 13:45:47 2013 From: david.holmes at oracle.com (David Holmes) Date: Tue, 26 Feb 2013 07:45:47 +1000 Subject: Code review request In-Reply-To: <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> Message-ID: <512BDB8B.9070005@oracle.com> On 26/02/2013 3:31 AM, Paul Sandoz wrote: > Hi Remi, > > Thanks for the feedback i have addressed some of this, mostly related to > inner classes, in following change set to the lambda repo: > > http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea I see a lot of private things that are now package-access. Is that because they are now being used within the package? The access modifiers document intended usage even if there is limited accessibility to the class defining the member. The idea that a class restricted to package-access should have member access modifiers restricted to package-only or else private, is just plain wrong in my view. Each type should have a public, protected and private API. The exposure of the type within a package is a separate matter. Package-access then becomes a limited-sharing mechanism. David > We can update the webrev next week. > > > On Feb 23, 2013, at 11:51 AM, Remi Forax > wrote: > >> On 02/21/2013 08:17 PM, Brian Goetz wrote: >>> At >>> http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/ >>> >>> I've posted a webrev for about half the classes in java.util.stream. >>> None of these are public classes, so there are no public API issues >>> here, but plenty of internal API issues, naming issues (ooh, a >>> bikeshed), and code quality issues. >>> >> >> Hi Brian, >> >> All protected fields should not be protected but package visible. >> Classes are package private so there is no need to use a modifier >> which offer a wider visibility. >> The same is true for constructors. >> > > I agree with this, if there are no further objections i will fix in the > lambda repo towards the end of the week. > > >> For default method, some of them are marked public, some of them are not, >> what the coding convention said ? >> > > AFAICT "public" was only on two such default methods, so i have removed > that modifier. > > >> Code convention again, there is a lot of if/else with no curly braces, >> or only curly braces >> on the if part but not on the else part. >> Also, when a if block ends with a return, there is no need to use 'else', >> >> if (result != null) { >> foundResult(result); >> return result; >> } >> else >> return null; >> >> can be simply written: >> >> if (result != null) { >> foundResult(result); >> return result; >> } >> return null; >> > > Regarding code conventions i would prefer to auto-format all code to > ensure consistency, as to what that consistency is, well we could argue > until heat death of the universe :-) I am fine as long as it is > consistent and easy to hit Alt-Cmd-L or what ever it is in ones > favourite IDE. > > >> >> All inner class should not have private constructors, like by example >> FindOp.FindTask, because >> the compiler will have to generate a special accessor for them when >> they are called from >> the outer class. >> > > I have made changes to all inner classes to conform to this. I have also > marked all classes as final where appropriate. > > >> In AbstractShortCircuitTask: >> It's not clear that cancel and sharedResult can be accessed directly >> given that they both have methods that acts as getter and setter. >> If they can be accessed directly, I think it's better to declare them >> private and to use getters. >> > > They should be private, they are not accessed outside of that class. I > will fix. > > >> Depending on the ops, some of them do nullcheks of arguments at >> creating time (ForEachOp) , some of them don't (FindOp). >> In ForEachUntilOp, the 'consumer' is checked but 'until' is not. >> > > OK, there are probably lots of missing null checks in the code... > > >> in ForEachOp, most of the keyword protected are not needed, >> ForEachUntilOp which inherits from ForEachOp is in the same package. >> > > >> In ForEachUntilOp, the constructor should be private (like for all the >> other ops). >> > > Done. > > >> In MatchOp, line 110, I think the compiler bug is fixed now ? > > Not yet, i can still reproduce it. > > >> The enum MatchKind should not be public and all constructor of all >> inner classes should not be private. >> > > Done. > > >> In OpsUtils, some static methods are public and some are not, >> > > OpUtils is now gone in the lambda repo. The forEach and reduce > functionality is moved into the corresponding op classes. The static > method has been moved to a default method on PipelineHelper. > > >> in Tripwire: >> enabled should be in uppercase (ENABLED). >> method trip() should be: >> public static void trip(Class trippingClass, String msg) >> > > Done. I also made the field and method package private. > > Thanks, > Paul. > From sam at sampullara.com Mon Feb 25 21:16:27 2013 From: sam at sampullara.com (Sam Pullara) Date: Mon, 25 Feb 2013 21:16:27 -0800 Subject: Where has the map method on Optional moved? In-Reply-To: References: <5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com> Message-ID: I've never been comfortable with this. I'm glad Jed is calling it out. Can we make Optional first class or remove it? Sam On Mon, Feb 25, 2013 at 9:12 PM, Jed Wesley-Smith wrote: > Hi Paul, > > You don't get a choice, it is a (or forms a) monad, you just removed > the useful methods (map/flatMap aka fmap/bind). This leaves clients to > implement them (or the functionality) in an ad-hoc and possibly buggy > form themselves. > > It is a monad if there exists some pair of functions: > > A -> Option > Option -> (A -> Option) -> Option > > The first is Optional.of, the second is currently: > > Optional a = ? > Optional b = ? > Function f = ? > if (a.isPresent) { > b = f.apply(a.get()); > } else { > b = Optional.empty(); > } > > rather than: > > Optional a = ? > Function f = ? > final Optional b = a.flatMap(f); > > cheers, > jed. > > On 26 February 2013 00:12, Paul Sandoz wrote: >> Hi Dhananjay, >> >> It is not missing it was removed. >> >> java.util.Optional has a narrower scope that optional things in other languages. We are not trying to shoe-horn in an option monad. >> >> Paul. >> >> On Feb 23, 2013, at 12:27 AM, Dhananjay Nene wrote: >> >>> It seemed to be there on the Optional class in b61 but is missing now. Is >>> there some way to run map/flatMap operations on an Optional? >>> >>> Thanks >>> Dhananjay >>> >> >> > From forax at univ-mlv.fr Mon Feb 25 23:19:26 2013 From: forax at univ-mlv.fr (=?utf-8?B?UmVtaSBGb3JheA==?=) Date: Tue, 26 Feb 2013 08:19:26 +0100 Subject: =?utf-8?B?UmU6IFdoZXJlIGhhcyB0aGUgbWFwIG1ldGhvZCBvbiBPcHRpb25hbCBtb3ZlZD8=?= Message-ID: <201302260719.r1Q7JNh4004890@monge.univ-mlv.fr> Yes, I vote to remove it because it doesn't map :) well with the java mindset. That said, we already discuss that and other alternatives are less nice to use, at least until we use the static import trick (as with reducers) Sent from my Phone ----- Reply message ----- From: "Sam Pullara" To: Subject: Where has the map method on Optional moved? Date: Tue, Feb 26, 2013 06:16 I've never been comfortable with this. I'm glad Jed is calling it out. Can we make Optional first class or remove it? Sam On Mon, Feb 25, 2013 at 9:12 PM, Jed Wesley-Smith wrote: > Hi Paul, > > You don't get a choice, it is a (or forms a) monad, you just removed > the useful methods (map/flatMap aka fmap/bind). This leaves clients to > implement them (or the functionality) in an ad-hoc and possibly buggy > form themselves. > > It is a monad if there exists some pair of functions: > > A -> Option > Option -> (A -> Option) -> Option > > The first is Optional.of, the second is currently: > > Optional a = ? > Optional b = ? > Function f = ? > if (a.isPresent) { > b = f.apply(a.get()); > } else { > b = Optional.empty(); > } > > rather than: > > Optional a = ? > Function f = ? > final Optional b = a.flatMap(f); > > cheers, > jed. > > On 26 February 2013 00:12, Paul Sandoz wrote: >> Hi Dhananjay, >> >> It is not missing it was removed. >> >> java.util.Optional has a narrower scope that optional things in other languages. We are not trying to shoe-horn in an option monad. >> >> Paul. >> >> On Feb 23, 2013, at 12:27 AM, Dhananjay Nene wrote: >> >>> It seemed to be there on the Optional class in b61 but is missing now. Is >>> there some way to run map/flatMap operations on an Optional? >>> >>> Thanks >>> Dhananjay >>> >> >> > From forax at univ-mlv.fr Tue Feb 26 00:15:00 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 26 Feb 2013 09:15:00 +0100 Subject: Where has the map method on Optional moved? In-Reply-To: <201302260719.r1Q7JNh4004890@monge.univ-mlv.fr> References: <201302260719.r1Q7JNh4004890@monge.univ-mlv.fr> Message-ID: <512C6F04.1020206@univ-mlv.fr> On 02/26/2013 08:19 AM, Remi Forax wrote: > Yes, I vote to remove it because it doesn't map :) well with the java > mindset. > That said, we already discussed that and other alternatives are less > nice to use, at least until we use the static import trick (as with > reducers) just to be crystal clear. interface Optionalizer { // good name needed abstract R result(boolean isPresent, T element); public static Optionalizer orDefaultValue(T defaultValue) { return (isPresent, element) -> isPresent? element: defaultValue; } public static Optionalizer isPresent() { return (isPresent, element) -> isPresent; } public static Optionalizer andIfPresent(Consumer consumer) { return (isPresent, element) -> { if (isPresent) { consumer.accept(element); } }; } public static Optionalizer T orNull() { // doesn't use orDefaultValue(null) because the returned lambda is not constant :) // maybe better to do a null check in orDefaultValue return (isPresent, element) -> isPresent? element: null: } } with: interface Stream { R findFirst(Optionalizer optionalizer); } examples: String s = streamOfString.findFirst(orDefaultValue("")): boolean isPresent = streamOfString.findFirst(isPresent()); streamOfString.findFirst(andIfPresent(System.out::println)); String s2 = streamOfString.findFirst(orNull()): I think i can like this. R?mi > > > > Sent from my Phone > > ----- Reply message ----- > From: "Sam Pullara" > To: > Subject: Where has the map method on Optional moved? > Date: Tue, Feb 26, 2013 06:16 > > > I've never been comfortable with this. I'm glad Jed is calling it out. > Can we make Optional first class or remove it? > > Sam > > On Mon, Feb 25, 2013 at 9:12 PM, Jed Wesley-Smith > wrote: > > Hi Paul, > > > > You don't get a choice, it is a (or forms a) monad, you just removed > > the useful methods (map/flatMap aka fmap/bind). This leaves clients to > > implement them (or the functionality) in an ad-hoc and possibly buggy > > form themselves. > > > > It is a monad if there exists some pair of functions: > > > > A -> Option > > Option -> (A -> Option) -> Option > > > > The first is Optional.of, the second is currently: > > > > Optional a = ? > > Optional b = ? > > Function f = ? > > if (a.isPresent) { > > b = f.apply(a.get()); > > } else { > > b = Optional.empty(); > > } > > > > rather than: > > > > Optional a = ? > > Function f = ? > > final Optional b = a.flatMap(f); > > > > cheers, > > jed. > > > > On 26 February 2013 00:12, Paul Sandoz wrote: > >> Hi Dhananjay, > >> > >> It is not missing it was removed. > >> > >> java.util.Optional has a narrower scope that optional things in > other languages. We are not trying to shoe-horn in an option monad. > >> > >> Paul. > >> > >> On Feb 23, 2013, at 12:27 AM, Dhananjay Nene > wrote: > >> > >>> It seemed to be there on the Optional class in b61 but is missing > now. Is > >>> there some way to run map/flatMap operations on an Optional? > >>> > >>> Thanks > >>> Dhananjay > >>> > >> > >> > > From paul.sandoz at oracle.com Tue Feb 26 02:47:25 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 26 Feb 2013 11:47:25 +0100 Subject: Where has the map method on Optional moved? In-Reply-To: References: <5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com> Message-ID: On Feb 26, 2013, at 6:16 AM, Sam Pullara wrote: > I've never been comfortable with this. I'm glad Jed is calling it out. > Can we make Optional first class or remove it? Trawling through the email archives i cannot find specific discussion on Optional.map/flatMap, anyone recall such discussion? There is some general discussion on not going down the route of an option monad. Since we have added a Stream.flatMap method following the pattern expected of it, does that give more weight to argument of adding such a method to Optional, perhaps of the same or a different name to disassociate with Stream itself? I find myself more and more leaning towards the position of if we include an Option class some developers will expect bind functionality, it seems useful, and nor is it hard to explain its use while avoiding mention the "m" word. Paul. > > Sam > > On Mon, Feb 25, 2013 at 9:12 PM, Jed Wesley-Smith wrote: >> Hi Paul, >> >> You don't get a choice, it is a (or forms a) monad, you just removed >> the useful methods (map/flatMap aka fmap/bind). This leaves clients to >> implement them (or the functionality) in an ad-hoc and possibly buggy >> form themselves. >> >> It is a monad if there exists some pair of functions: >> >> A -> Option >> Option -> (A -> Option) -> Option >> >> The first is Optional.of, the second is currently: >> >> Optional a = ? >> Optional b = ? >> Function f = ? >> if (a.isPresent) { >> b = f.apply(a.get()); >> } else { >> b = Optional.empty(); >> } >> >> rather than: >> >> Optional a = ? >> Function f = ? >> final Optional b = a.flatMap(f); >> >> cheers, >> jed. >> >> On 26 February 2013 00:12, Paul Sandoz wrote: >>> Hi Dhananjay, >>> >>> It is not missing it was removed. >>> >>> java.util.Optional has a narrower scope that optional things in other languages. We are not trying to shoe-horn in an option monad. >>> >>> Paul. >>> >>> On Feb 23, 2013, at 12:27 AM, Dhananjay Nene wrote: >>> >>>> It seemed to be there on the Optional class in b61 but is missing now. Is >>>> there some way to run map/flatMap operations on an Optional? >>>> >>>> Thanks >>>> Dhananjay >>>> >>> >>> >> From paul.sandoz at oracle.com Tue Feb 26 03:03:00 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 26 Feb 2013 12:03:00 +0100 Subject: Code review request In-Reply-To: References: <512672E6.1050708@oracle.com> Message-ID: <4D7CE6AD-5BC6-4FBA-A41E-C8BC182F2F64@oracle.com> Hi Tim, Thanks for the comments. On Feb 23, 2013, at 6:06 PM, Tim Peierls wrote: > On Thu, Feb 21, 2013 at 2:17 PM, Brian Goetz wrote: > >> At >> http://cr.openjdk.java.net/~**briangoetz/jdk-8008670/webrev/ >> >> I've posted a webrev for about half the classes in java.util.stream. None >> of these are public classes, so there are no public API issues here, but >> plenty of internal API issues, naming issues (ooh, a bikeshed), and code >> quality issues. >> > > Things I noticed before I ran out of steam: > > In AbstractTask the use of multicharacter type > parameters is confusing, especially with an underscore. AbstractTask R, C>, , or even would be better. > I added "P_IN" etc because i found i kept forgetting which single character type variable corresponded to what :-) So for the following: interface IntermediateOp { Sink wrapSink(int flags, Sink sink); default Node evaluateParallel(PipelineHelper helper) I found it clearer which variables were corresponding to the input/output types of of pipeline and which to the ops, and have attempted to stick to that pattern throughout the code, although i suspect it is not completely consistent. Having said that i think consistency is of use is the most important aspect. > BiBlock -> BiConsumer in Map.java comments. > Fixed in the lambda repo. Paul. From paul.sandoz at oracle.com Tue Feb 26 03:54:15 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 26 Feb 2013 12:54:15 +0100 Subject: Code review request In-Reply-To: References: <512672E6.1050708@oracle.com> Message-ID: Hi Joe, Thanks for the comments. I have not resolved the more general comments about code style and tense. On Feb 23, 2013, at 8:42 PM, Joe Bowbeer wrote: > We should send these comments in emails? I don't see a way to comment at > the link provided. > > I repeat some of Remi's comments regarding formatting below. > > File: > > http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/src/share/classes/java/util/Map.java.patch > > 1. Please run this through a code formatter to conform with Oracle's > standard. Things to fix: > > parameter wrapping should indent only 8 spaces: > > + default V merge(K key, V value, > + BiFunction > remappingFunction) { > > if-else brace should be on same line: > > + } > + else if ((newValue = remappingFunction.apply(oldValue, value)) != null) { > > multi-line 'if' always needs braces? > > + if (replace(key, oldValue, newValue)) > + return newValue; > > > 2. replaceAll javadoc: Function#map => Function#apply > > calling the function's {@code Function#map} method > > => > calling the function's {@code Function#apply} method > Fixed in the lambda repo. > > 3. replaceAll question > > What's with all the finals? > > + final Iterator> entries = entrySet().iterator(); > + while (entries.hasNext()) { > + final Map.Entry entry = entries.next(); > + entry.setValue(function.apply(entry.getKey(), > entry.getValue())); > + } > > Why not code this as follows, just like forEach? > > + for (Map.Entry entry : entrySet()) { > + entry.setValue(function.apply(entry.getKey(), > entry.getValue())); > + } > Fixed. On Feb 24, 2013, at 10:09 PM, Joe Bowbeer wrote: > A few more comments. > > 1. General: > > The method descriptions should be written 3rd person declarative, according > to Oracle's style guide > > http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#styleguide > > This is not followed in many places. For example: > > Get the {@code StreamShape} describing the input shape of the pipeline > > => > Gets the {@code StreamShape} describing the input shape of the pipeline. > > > 2. Typo (missing space) in PipelineHelper javadoc: > > 40 * the last intermediate operation described by this {@code > PipelineHelper}.The > Fixed. > > 3. StreamShape enum is missing its per-element javadoc > Fixed. Paul. From tim at peierls.net Tue Feb 26 04:37:19 2013 From: tim at peierls.net (Tim Peierls) Date: Tue, 26 Feb 2013 07:37:19 -0500 Subject: Where has the map method on Optional moved? In-Reply-To: References: <5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com> Message-ID: On Tue, Feb 26, 2013 at 5:47 AM, Paul Sandoz wrote: > On Feb 26, 2013, at 6:16 AM, Sam Pullara wrote: > > I've never been comfortable with this. I'm glad Jed is calling it out. > > Can we make Optional first class or remove it? > > Trawling through the email archives i cannot find specific discussion on > Optional.map/flatMap, anyone recall such discussion? There is some general > discussion on not going down the route of an option monad. > In particular, you wrote: > java.util.Optional has a narrower scope that optional things in other > languages. We are not trying to shoe-horn in an option monad. And I say amen to that. Keep Optional, and keep it lean and mean. (At least as lean as the Guava Optional.) > Since we have added a Stream.flatMap method following the pattern expected > of it, does that give more weight to argument of adding such a method to > Optional, perhaps of the same or a different name to disassociate with > Stream itself? > No, it doesn't. The two are unrelated. Stream is a lofty abstraction and Optional is (or should be) a grunt-level utility. > I find myself more and more leaning towards the position of if we include > an Option class some developers will expect bind functionality, it seems > useful, and nor is it hard to explain its use while avoiding mention the > "m" word. > The more you saddle Optional with "might be useful" and "X might want this" features, the harder it will be for regular people -- people who don't know what a monad is -- to assimilate and use Optional sensibly. One of the reasons Doug Lea is not an Optional fan is fear of things like Collection>, something that is less likely to happen with a minimalist Optional. Here's another reason to stay lean: The more limited Optional is, the easier it will be some day to optimize away the extra object. Make it a first class participant and you can kiss those optimizations goodbye. --tim From paul.sandoz at oracle.com Tue Feb 26 04:49:48 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 26 Feb 2013 13:49:48 +0100 Subject: Where has the map method on Optional moved? In-Reply-To: References: <5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com> Message-ID: <7673857C-B407-489F-AB19-7B5317AB9AE7@oracle.com> On Feb 26, 2013, at 1:37 PM, Tim Peierls wrote: > On Tue, Feb 26, 2013 at 5:47 AM, Paul Sandoz wrote: > On Feb 26, 2013, at 6:16 AM, Sam Pullara wrote: > > I've never been comfortable with this. I'm glad Jed is calling it out. > > Can we make Optional first class or remove it? > > Trawling through the email archives i cannot find specific discussion on Optional.map/flatMap, anyone recall such discussion? There is some general discussion on not going down the route of an option monad. > > In particular, you wrote: > java.util.Optional has a narrower scope that optional things in other languages. We are not trying to shoe-horn in an option monad. > > And I say amen to that. Keep Optional, and keep it lean and mean. (At least as lean as the Guava Optional.) > I notice Guava's Optional has a transform method: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Optional.html#transform(com.google.common.base.Function) > > Since we have added a Stream.flatMap method following the pattern expected of it, does that give more weight to argument of adding such a method to Optional, perhaps of the same or a different name to disassociate with Stream itself? > > No, it doesn't. The two are unrelated. Stream is a lofty abstraction and Optional is (or should be) a grunt-level utility. > > > I find myself more and more leaning towards the position of if we include an Option class some developers will expect bind functionality, it seems useful, and nor is it hard to explain its use while avoiding mention the "m" word. > > The more you saddle Optional with "might be useful" and "X might want this" features, the harder it will be for regular people -- people who don't know what a monad is -- to assimilate and use Optional sensibly. One of the reasons Doug Lea is not an Optional fan is fear of things like Collection>, something that is less likely to happen with a minimalist Optional. > Especially so if we make it more difficult by removing the hashcode and equals methods. > Here's another reason to stay lean: The more limited Optional is, the easier it will be some day to optimize away the extra object. Make it a first class participant and you can kiss those optimizations goodbye. > Very true. Paul. From tim at peierls.net Tue Feb 26 05:02:55 2013 From: tim at peierls.net (Tim Peierls) Date: Tue, 26 Feb 2013 08:02:55 -0500 Subject: Where has the map method on Optional moved? In-Reply-To: <7673857C-B407-489F-AB19-7B5317AB9AE7@oracle.com> References: <5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com> <7673857C-B407-489F-AB19-7B5317AB9AE7@oracle.com> Message-ID: On Tue, Feb 26, 2013 at 7:49 AM, Paul Sandoz wrote: > Keep Optional, and keep it lean and mean. (At least as lean as the Guava > Optional.) > > I notice Guava's Optional has a transform method: > Right -- "at *least* as lean". But leaner is better. I haven't used the transform method. It feels like the top of the slippery slope that leads to monstrosities like Map>. > One of the reasons Doug Lea is not an Optional fan is fear of things like > Collection>, something that is less likely to happen with a > minimalist Optional. > > Especially so if we make it more difficult by removing the hashcode and > equals methods. > Yes! With docs explaining why. --tim From paul.sandoz at oracle.com Tue Feb 26 05:11:50 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 26 Feb 2013 14:11:50 +0100 Subject: Code review request In-Reply-To: <512BDB8B.9070005@oracle.com> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> <512BDB8B.9070005@oracle.com> Message-ID: <958F39D3-EB3A-4FD2-ABAF-5109BDDD8D49@oracle.com> On Feb 25, 2013, at 10:45 PM, David Holmes wrote: > On 26/02/2013 3:31 AM, Paul Sandoz wrote: >> Hi Remi, >> >> Thanks for the feedback i have addressed some of this, mostly related to >> inner classes, in following change set to the lambda repo: >> >> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea > > I see a lot of private things that are now package-access. I presume you mean on constructors of private inner classes? > Is that because they are now being used within the package? > No, it is to avoid the creation of a synthetic package private constructor called by enclosing class to construct the inner class. > The access modifiers document intended usage even if there is limited accessibility to the class defining the member. The idea that a class restricted to package-access should have member access modifiers restricted to package-only or else private, is just plain wrong in my view. Each type should have a public, protected and private API. The exposure of the type within a package is a separate matter. Package-access then becomes a limited-sharing mechanism. > For private inner classes i took the view that protected on fields offered little value, but paused for top level classes. There are not many use-cases in the JDK at least for the packages i browsed. The class java.util.concurrent.atomic.Striped64 does not bother with protected. I am leaning towards the opinion that protected is just noise in these cases since the compiler offers no protection. Paul. From forax at univ-mlv.fr Tue Feb 26 05:40:37 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 26 Feb 2013 14:40:37 +0100 Subject: Code review request In-Reply-To: <958F39D3-EB3A-4FD2-ABAF-5109BDDD8D49@oracle.com> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> <512BDB8B.9070005@oracle.com> <958F39D3-EB3A-4FD2-ABAF-5109BDDD8D49@oracle.com> Message-ID: <512CBB55.7080301@univ-mlv.fr> On 02/26/2013 02:11 PM, Paul Sandoz wrote: > On Feb 25, 2013, at 10:45 PM, David Holmes wrote: > >> On 26/02/2013 3:31 AM, Paul Sandoz wrote: >>> Hi Remi, >>> >>> Thanks for the feedback i have addressed some of this, mostly related to >>> inner classes, in following change set to the lambda repo: >>> >>> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea >> I see a lot of private things that are now package-access. > I presume you mean on constructors of private inner classes? > > >> Is that because they are now being used within the package? >> > No, it is to avoid the creation of a synthetic package private constructor called by enclosing class to construct the inner class. > > >> The access modifiers document intended usage even if there is limited accessibility to the class defining the member. The idea that a class restricted to package-access should have member access modifiers restricted to package-only or else private, is just plain wrong in my view. Each type should have a public, protected and private API. The exposure of the type within a package is a separate matter. Package-access then becomes a limited-sharing mechanism. >> > For private inner classes i took the view that protected on fields offered little value, but paused for top level classes. > > There are not many use-cases in the JDK at least for the packages i browsed. The class java.util.concurrent.atomic.Striped64 does not bother with protected. > > I am leaning towards the opinion that protected is just noise in these cases since the compiler offers no protection. amen :) > > Paul. R?mi From forax at univ-mlv.fr Tue Feb 26 05:50:11 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 26 Feb 2013 14:50:11 +0100 Subject: Code review request In-Reply-To: <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> Message-ID: <512CBD93.1080805@univ-mlv.fr> On 02/25/2013 06:31 PM, Paul Sandoz wrote: > Hi Remi, > > Thanks for the feedback i have addressed some of this, mostly related to inner classes, in following change set to the lambda repo: > > http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea > > We can update the webrev next week. There are still some methods that are declared 'default public' and some that are declared just with 'default'. I propose the following code convention for abstract/default method in interface. All methods in interface are marked public (just because we may support private static method in jdk9), default method should be 'public default' and not 'default public', like we have public static, visibility modifier first, and abstract methods in the same interface should be declared only 'public'. R?mi ... From paul.sandoz at oracle.com Tue Feb 26 07:41:14 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 26 Feb 2013 16:41:14 +0100 Subject: Code review request In-Reply-To: <512CBD93.1080805@univ-mlv.fr> References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr> <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com> <512CBD93.1080805@univ-mlv.fr> Message-ID: <20D89B20-0604-45E9-AAA7-38F74298C77A@oracle.com> On Feb 26, 2013, at 2:50 PM, Remi Forax wrote: > On 02/25/2013 06:31 PM, Paul Sandoz wrote: >> Hi Remi, >> >> Thanks for the feedback i have addressed some of this, mostly related to inner classes, in following change set to the lambda repo: >> >> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea >> >> We can update the webrev next week. > > > > There are still some methods that are declared 'default public' Where? > and some that are declared just with 'default'. > > I propose the following code convention for abstract/default method in interface. > All methods in interface are marked public (just because we may support private static method in jdk9), > default method should be 'public default' and not 'default public', like we have public static, visibility modifier first, > and abstract methods in the same interface should be declared only 'public'. > I do not relish your proposal of changing all abstract methods in interfaces to be declared redundantly public because of potential future features, even if such features are highly likely, we should have that discussion when those feature arrive. The source in the java.util.function package uses "public default" for default methods. That source has been through a round of reviews strongly indicating this was the preferred approach. Mike, is that so? Paul. From mike.duigou at oracle.com Tue Feb 26 08:56:00 2013 From: mike.duigou at oracle.com (Mike Duigou) Date: Tue, 26 Feb 2013 08:56:00 -0800 Subject: Where has the map method on Optional moved? In-Reply-To: References: <5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com> <7673857C-B407-489F-AB19-7B5317AB9AE7@oracle.com> Message-ID: <5A2AA78F-5A1B-4117-84AF-6E87A5B1D396@oracle.com> On Feb 26 2013, at 05:02 , Tim Peierls wrote: > On Tue, Feb 26, 2013 at 7:49 AM, Paul Sandoz wrote: >> Keep Optional, and keep it lean and mean. (At least as lean as the Guava Optional.) > I notice Guava's Optional has a transform method: > > Right -- "at least as lean". But leaner is better. I haven't used the transform method. It feels like the top of the slippery slope that leads to monstrosities like Map>. > > >> One of the reasons Doug Lea is not an Optional fan is fear of things like Collection>, something that is less likely to happen with a minimalist Optional. > Especially so if we make it more difficult by removing the hashcode and equals methods. > > Yes! With docs explaining why. I am working on this now. The methods won't be removed but will support only the identity hashCode()/equals(). > > --tim > From daniel.smith at oracle.com Tue Feb 26 13:47:10 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 26 Feb 2013 14:47:10 -0700 Subject: flatMap ambiguity Message-ID: A thread on lambda-dev highlighted a problem with the overloading of flatMap: Stream flatMap(FlatMapper mapper) IntStream flatMap(FlatMapper.ToInt mapper) LongStream flatMap(FlatMapper.ToLong mapper) DoubleStream flatMap(FlatMapper.ToDouble mapper) These functional interfaces have corresponding descriptors: (T, Consumer)->void (R is inferred) (T, IntConsumer)->void (T, LongConsumer)->void (T, DoubleConsumer)->void This violates the general rule that overloading with functional interfaces of the same shape shouldn't use different parameter types. Various ambiguities result: - "(x, sink) -> sink.accept(10)" is compatible with all the primitive consumers, and we have no way to disambiguate (the new most-specific rules handle this sort of thing with return types, but are not designed to decide which type is "better" for an arbitrary block of code). - "(x, sink) -> sink.accept(22.0)" is compatible with "(T, DoubleConsumer)->void", and leads "(T, Consumer)->void" to be provisionally applicable -- we don't check the body at all in that case, until we've had a chance to look at the target type of the 'flatMap' invocation and figure out what R is supposed to be. So, again, an ambiguity will occur. It would probably be best to give the primitive versions distinct names. --- Note also that an invocation like the following will always produce a Stream: stream.flatMap((x, sink) -> sink.accept("x")).filter(...).... Inference is forced to resolve R without knowing anything about it, and so it must go with the default "Object". The only way to get useful information about R is to derive bounds from the body of the lambda, and that's simply not something we can do in general. I don't know what to recommend in this case, except perhaps that this method inherently depends on some explicit typing (e.g., "(String x, Consumer sink) -> ..."). ?Dan From maurizio.cimadamore at oracle.com Tue Feb 26 14:54:20 2013 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Tue, 26 Feb 2013 22:54:20 +0000 Subject: flatMap ambiguity In-Reply-To: References: Message-ID: <512D3D1C.2040508@oracle.com> On 26/02/13 21:47, Dan Smith wrote: > Note also that an invocation like the following will always produce a Stream: > > stream.flatMap((x, sink) -> sink.accept("x")).filter(...).... Mostly unrelated: the current implementation will actually give up in such cases, issuing a 'cyclic inference' error message to give opportunity to add more type info. Maurizio