From paul.sandoz at oracle.com  Fri Feb  1 01:22:02 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Fri, 1 Feb 2013 10:22:02 +0100
Subject: Encounter order: take 2
In-Reply-To: <C01055CC-85E6-4640-BB32-75975B8A8B06@oracle.com>
References: <3843CA7B-CA6F-425A-990B-40D1250CF1AB@oracle.com>
	<C01055CC-85E6-4640-BB32-75975B8A8B06@oracle.com>
Message-ID: <27A3D1EC-361D-4D4E-8C73-A674EC6772A4@oracle.com>


On Feb 1, 2013, at 12:27 AM, Mike Duigou <mike.duigou at oracle.com> wrote:
>> 
>> An intermediate operation may clear encounter order so that the output stream and corresponding input stream to the next intermediate operation or terminal operation does not have encounter order.
>> There are no such operations implemented. 
>> (Previously the unordered() operation cleared encounter order.)
>> 
>> Otherwise an intermediate operation must preserve encounter order if required to do so (see next paragraphs).
>> 
>> An intermediate operation may choose to apply a different algorithm if encounter order of input stream must be preserved or not.
>> The distinct() operation will, when evaluating in parallel, use a ConcurrentHashMap to store unique elements if encounter order does not need to be preserved, otherwise if encounter order needs to be preserved a fold will be performed (equivalent of, in parallel, map each element to a singleton linked set then associatively reduce, left-to-right, the linked sets to one linked set).
> 
> Without unordered() how is the CHM version accessed if the source is an ArrayList?
> 

If the source has encounter order the distinct operation will choose whether to preserve encounter order or not as per clause b.2. i.e. the properties of the terminal operation are a factor.

Implementation wise the op checks if the ORDERED flag is on the bit set of flags passed to it so it is not as complicated as it sounds.


>> An intermediate operation must preserve encounter order of output stream if:
>> 
>> a.1) the input stream to the intermediate operation has encounter order (either because the stream source has encounter order or because a previous intermediate operation injects encounter order); and
>> a.2) the terminal operation preserves encounter order.
>> 
>> An intermediate operation may not preserve encounter order of the output stream if:
>> 
>> b.1) the input stream to the intermediate operation does not have encounter order (either because the stream source does not have encounter order or because a previous intermediate operation clears encounter order); or
>> b.2) the terminal operation does not preserve encounter order *and* the intermediate operation is in a sequence of operations, to be computed, where the last operation in the sequence is the terminal operation and all operations in the sequence are computed in parallel.
>> 
>> Rule b.2 above ensures that encounter order is preserved for the following pipeline on the sequential().forEach():
>> 
>> list.parallelStream().distinct().sequential().forEach()
>> 
>> i.e. the distinct() intermediate operation will preserve the encounter order of the list stream source.
> 
> I find this result surprising (and disappointing). Users are going to be surprised by the poor performance of using parallelStream in this case.
> 

The sequential() op is currently implemented as a full barrier to ensure elements are reported sequentially downstream in the same thread that created the stream.

The following will produce the same output:

  list.stream().distinct().forEach(...)
  list.parallelStream().distinct().sequential().forEach(...)

Which i think conforms to the principle of least surprise.

If performance is a concern and order is not then one should not use sequential e.g. do:

  list.parallelStream().distinct().forEach(e -> { synchronizied(this) { ... } } )

or a concurrent collect.

Paul.

From brian.goetz at oracle.com  Fri Feb  1 07:01:14 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 01 Feb 2013 10:01:14 -0500
Subject: Javadoc conventions in the presence of default methods
In-Reply-To: <510BC5BC.2050509@redhat.com>
References: <510B2903.4070000@oracle.com> <510BC5BC.2050509@redhat.com>
Message-ID: <510BD8BA.2050501@oracle.com>

> I agree that the different parts should be separately inheritable.  If a
> subtype overrides a method though, I think only the main doc should be
> inherited (since the implspec parts seem to be mainly for the benefit of
> implementers, and afaict you cannot override a default method without
> changing or dropping its implementation).

OK good, since that's how it was proposed.

> I'm not sure I'm really keen on separating specification from notes
> though.  That seems pretty specific to organizational preferences and
> conventions.  It seems to me that you'd want to inherit API notes along
> with API spec always, and you'd want to keep implementation notes with
> the implementation spec always, thus it just becomes a formatting
> nicety.  Put another way, we've gone this long without; why do we
> suddenly need it now?

It is not like the "need it" function has gone precipitously from zero 
to one.  Realistically, its been creeping up slowly for years, and API 
maintainers have to go through all sorts of hoop-jumping to specify 
things, and there has been much effort spent fixing spec bugs (or worse, 
living with bad spec) that amount to conflating normative/informative or 
api/implementation spec.

Adding default methods will increase the need in a nontrivial way. 
Until recently, there were only a few examples of optional methods in 
the JDK, mostly the mutative methods in Abstract{Collection,List,Map}. 
So there were a few of them, we could get by with a crutch.  But as 
we'll be getting more, the crutches don't scale.  People constantly make 
mistakes with the use of phrases like "this implementation" where its 
not clear what that actually means.

So, its been a problem all along, been getting slowly worse with age, 
and we're about to dump some gas on an already-burning fire.


From scolebourne at joda.org  Fri Feb  1 07:36:57 2013
From: scolebourne at joda.org (Stephen Colebourne)
Date: Fri, 1 Feb 2013 15:36:57 +0000
Subject: Javadoc conventions in the presence of default methods
In-Reply-To: <510B2903.4070000@oracle.com>
References: <510B2903.4070000@oracle.com>
Message-ID: <CACzrW9AS5g_AEgimStpYR2bx=-+y6iXBehox+JJBeYLTQp0Mpw@mail.gmail.com>

Thanks for the thread. I mostly agree.

On 1 February 2013 02:31, Brian Goetz <brian.goetz at oracle.com> wrote:
> We've tried this thread a few times without success, so let's try it again.

Should this be beyond Project Lambda EG?

> There are lots of things we might want to document about a method in an API.
> Historically we've framed them as either being "specification" (e.g.,
> necessary postconditions) or "implementation notes" (e.g., hints that give
> the user an idea what's going on under the hood.)  But really, there are
> four boxes (and we've been cramming them into two):
>
>   { API, implementation } x { specification, notes }

What about the difference betwen what implementors of "Java SE" need
to do vs subclass writers?

I'm guessing you intend both to be @implspec, but is that enough?

thanks
Stephen

From david.lloyd at redhat.com  Fri Feb  1 05:40:12 2013
From: david.lloyd at redhat.com (David M. Lloyd)
Date: Fri, 01 Feb 2013 07:40:12 -0600
Subject: Javadoc conventions in the presence of default methods
In-Reply-To: <510B2903.4070000@oracle.com>
References: <510B2903.4070000@oracle.com>
Message-ID: <510BC5BC.2050509@redhat.com>

I agree that the different parts should be separately inheritable.  If a 
subtype overrides a method though, I think only the main doc should be 
inherited (since the implspec parts seem to be mainly for the benefit of 
implementers, and afaict you cannot override a default method without 
changing or dropping its implementation).

I'm not sure I'm really keen on separating specification from notes 
though.  That seems pretty specific to organizational preferences and 
conventions.  It seems to me that you'd want to inherit API notes along 
with API spec always, and you'd want to keep implementation notes with 
the implementation spec always, thus it just becomes a formatting 
nicety.  Put another way, we've gone this long without; why do we 
suddenly need it now?

On 01/31/2013 08:31 PM, Brian Goetz wrote:
> We've tried this thread a few times without success, so let's try it again.
>
> There have been a number of cases where its not obvious how to document
> default methods.  After some analysis, this appears to be just another
> case where the complexity was present in Java from day 1, and default
> methods simply bring it to the fore because the confusing cases are
> expected to come up more often.  The following applies equally well to
> methods in abstract classes (or concrete classes) as to defaults.
>
> There are lots of things we might want to document about a method in an
> API.  Historically we've framed them as either being "specification"
> (e.g., necessary postconditions) or "implementation notes" (e.g., hints
> that give the user an idea what's going on under the hood.)  But really,
> there are four boxes (and we've been cramming them into two):
>
>    { API, implementation } x { specification, notes }
>
> (We sometimes use the terms normative/informative to describe the
> difference between specification/notes.)
>
> As background, here are some example uses for default methods which vary
> in their "expected prevalence of overriding".  I think the variety of
> use cases here have contributed to the confusion on how to document
> implementation characteristics.  (Note that all of these have analogues
> in abstract classes too, one can find examples in Abstract{List,Map,Set}.)
>
> 1.  Optional methods.  This is when the default implementation is barely
> conformant, such as the following from Iterator:
>
>      public default void remove() {
>          throw new UnsupportedOperationException("remove");
>      }
>
> It adheres to its contract, because the contract is explicitly weak, but
> any class that cares about removal will definitely want to override it.
>
> 2.  Methods with *reasonable* defaults but which might well be
> overridden by implementations that care enough.  For example, again from
> Iterator:
>
>      default void forEach(Consumer<? super E> consumer) {
>          while (hasNext())
>              consumer.accept(next());
>      }
>
> This implementation is perfectly fine for most implementations, but some
> classes (e.g., ArrayList) might have the chance to do better, if their
> maintainers are sufficiently motivated to do so.  The new methods on Map
> (e.g., putIfAbsent) are also in this bucket.
>
> 3.  Methods where its pretty unlikely anyone will ever override them,
> such as this method from Predicate:
>
>      public default Predicate<T> and(Predicate<? super T> p) {
>          Objects.requireNonNull(p);
>          return (T t) -> test(t) && p.test(t);
>      }
>
> These are all common enough cases.  The primary reason that the Javadoc
> needs to provide some information about the implementation, separate
> from the API specification, is so that those who would extend these
> classes or interfaces can know which methods they need to / want to
> override.  It should be clear from the doc that anyone who implements
> Iterator MUST implement remove() if they want removal to happen, CAN
> override forEach if they think it will result in better performance, and
> almost certainly don't need to override Predicate.and().
>
>
> The question is made more complicated by the prevalent use of the
> ambiguous phrase "this implementation."  We often use "this
> implementation" to describe both normative and informative aspects of
> the implementation, and readers are left to guess which.  (Does "this
> implementation" mean all versions of Oracle's JDK forever?  The current
> version in Oracle's JDK?  All versions of all JDKs?  The implementation
> in a specific class?  Could IBM's JDK throw a different exception from
> UOE from the default of Iterator.remove()?  What happens when the doc is
> @inheritDoc'ed into a subclass that overrides the method?  Etc.  The
> phrase is too vague to be useful, and this vagueness has been the
> subject of many bug report.)
>
> I think one measure of success of this effort should be "can we replace
> all uses of 'this implementation' with something that is more
> informative and fits neatly within the model."
>
>
> As said earlier, there are four boxes.  Here are some descriptions of
> what belongs in each box.
>
> 1.  API specification.  This is the one we know and love; a description
> that applies equally to all valid implementations of the method,
> including preconditions, postconditions, etc.
>
> 2.  API notes.  Commentary, rationale, or examples pertaining to the API.
>
> 3.  Implementation specification.  This is where we say what it means to
> be a valid default implementation (or an overrideable implementation in
> a class), such as "throws UOE."  Similarly this is where we'd describe
> what the default for putIfAbsent does.  It is from this box that the
> would-be-implementer gets enough information to make a sensible decision
> as to whether or not to override.
>
> 4.  Implementation notes.  Informative notes about the implementation,
> such as performance characteristics that are specific to the
> implementation in this class in this JDK in this version, and might
> change.  These things are allowed to vary across platforms, vendors and
> versions.
>
> Once we recognize that these are the four boxes, I think everything gets
> simpler.
>
>
> Strawman Proposal
> -----------------
>
> As a strawman proposal, here's one way to explicitly label the four
> boxes: add three new Javadoc tags, @apinote, @implspec, and @implnote.
> (The remaining box, API Spec, needs no new tag, since that's how Javadoc
> is used already.)  @impl{spec,note} can apply equally well to a concrete
> method in a class or a default method in an interface.
>
> (Rule of engagement: bikeshedding the names will be interpreted as a
> waiver to ever say anything about the model or the semantics.  So you
> may bikeshed, but it must be your last word on the topic.)
>
> /**
>    * ... API specifications ...
>    *
>    * @apinote
>    * ... API notes ...
>    *
>    * @implspec
>    * ... implementation specification ...
>    *
>    * @implnote
>    * ... implementation notes ...
>    *
>    * @param ...
>    * @return ...
>    */
>
> Applying this to some existing Javadoc, take AbstractMap.putAll:
>
>      Copies all of the mappings from the specified map to this map
>      (optional operation). The effect of this call is equivalent to
>      that of calling put(k, v) on this map once for each mapping from
>      key k to value v in the specified map. The behavior of this
>      operation is undefined if the specified map is modified while
>      the operation is in progress.
>
>      This implementation iterates over the specified map's
>      entrySet() collection, and calls this map's put operation
>      once for each entry returned by the iteration.
>
> The first paragraph is API specification and the second is
> implementation *specification*, as users expect the implementation in
> AbstractMap, regardless of version or vendor, to behave this way.  The
> change here would be to replace "This implementation" with @implspec,
> and the ambiguity over "this implementation" goes away.
>
> The doc for Iterator.remove could be:
>
>   /**
>    * Removes from the underlying collection the last element returned by
>    * this iterator (optional operation). This method can be called only
>    * once per call to next(). The behavior of an iterator is unspecified
>    * if the underlying collection is modified while the iteration is in
>    * progress in any way other than by calling this method.
>    *
>    * @implspec
>    * The default implementation must throw UnsupportedOperationException.
>    *
>    * @implnote
>    * For purposes of efficiency, the same UnsupportedOperationException
>    * instance is always thrown. [*]
>    */
>
> [*] We don't really intend to implement it this way; this is just an
> example of an @implnote.
>
>
> The doc for Map.putIfAbsent could be:
>
>    /**
>     * If the specified key is not already associated with a value,
> associates
>     * it with the given value.
>     *
>     * @implspec
>     * Th default behaves as if:
>     * <pre> {@code
>     * if (!map.containsKey(key))
>     *   return map.put(key, value);
>     * else
>     *   return map.get(key);
>     * } </pre>
>     *
>     * @implnote
>     * This default implementation is implemented essentially as described
>     * in the API note. This operation is not atomic. Atomicity, if desired,
>     * must be provided by a subclass implementation.
>     */
>
>
> Secondary: one can build on this to eliminate some common inheritance
> anomalies by making these inheritable separately, where @inheritDoc is
> interpreted as "inherit the stuff from the corresponding section."  This
> is backward compatible because these sections do not yet exist in old
> docs.  SO to inherit API spec and implementation spec, you would do:
>
>   /**
>    * {@inheritDoc}
>    * @implspec
>    * {@inheritDoc}
>    * ...
>    */


-- 
- DML

From dl at cs.oswego.edu  Sun Feb  3 07:02:42 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sun, 03 Feb 2013 10:02:42 -0500
Subject: Javadoc conventions in the presence of default methods
In-Reply-To: <510B2903.4070000@oracle.com>
References: <510B2903.4070000@oracle.com>
Message-ID: <510E7C12.7020100@cs.oswego.edu>

On 01/31/13 21:31, Brian Goetz wrote:

>    { API, implementation } x { specification, notes }

>
> /**
>   * ... API specifications ...
>   *
>   * @apinote
>   * ... API notes ...
>   *
>   * @implspec
>   * ... implementation specification ...
>   *
>   * @implnote
>   * ... implementation notes ...
>   *
>   * @param ...
>   * @return ...
>   */

This sounds about right. Even though 90% of future @implspecs will
probably be for default methods, the need to use workarounds
for lack of them regularly arises in other cases.

I'm not completely sure about @apinote though.
For example, something saying that implementations
may have resource bound/capacity/default (without saying what
it is), is part of a spec, not just a note, so I
hope people don't use it as such.
(Further, while there could then be an @implnote
saying what that bound/etc value currently is, it is not
always a great idea to do it when nothing else depends
on the choice.  Even saying what it is sometimes invites future
problems when you need to change it.)

Similarly for some performance-related issues. For example
TreeMap should say as part of its spec that any implementation
must have expected/amortized O(log n) get and put operations.
It currently goes further and says that the implementation is
based on red-black trees, but that should probably be in
an @implnote. If we take these new categories
seriously, we'll want to do a pass through most java.util
(and related JDK) javadocs to carry this out consistently.

And so on. So the only remaining role of @apinote is
for misc rationales, warnings about potential future
changes, and things like that. Which usually textually
flow better in javadoc if just made part of the description.
So I don't see myself using it much if ever. But since I
can imagine uses here and there, I guess I have nothing against
it.

>
> Secondary: one can build on this to eliminate some common inheritance anomalies
> by making these inheritable separately, where @inheritDoc is interpreted as
> "inherit the stuff from the corresponding section."  This is backward compatible
> because these sections do not yet exist in old docs.  SO to inherit API spec and
> implementation spec, you would do:
>
>   /**
>    * {@inheritDoc}
>    * @implspec
>    * {@inheritDoc}
>    * ...
>    */
>

Yes. We've had to do huge amounts of copy/past/hack of
javadocs especially in java.util.concurrent to work around this
problem.

So, all-in-all: Yes, please do this.

-Doug


From brian.goetz at oracle.com  Sun Feb  3 07:17:52 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sun, 03 Feb 2013 10:17:52 -0500
Subject: Javadoc conventions in the presence of default methods
In-Reply-To: <510E7C12.7020100@cs.oswego.edu>
References: <510B2903.4070000@oracle.com> <510E7C12.7020100@cs.oswego.edu>
Message-ID: <510E7FA0.8060309@oracle.com>

"not completely sure" is a reasonable place to be with @apinote.  Of the 
four, its definitely the least useful.  But I think the "2x2" structure 
as proposed is a more sound basis than the "3" you'd get by taking it out.

Your other notes amount to "people could get it wrong."  Which is true, 
though there's plenty of room to get it wrong with the current scheme 
too.  I have a hard time believing we'll make it worse.  And it clearly 
addresses some long-standing gaps in our ability to document, which are 
about to get broader with the introduction of default methods.


On 2/3/2013 10:02 AM, Doug Lea wrote:
> On 01/31/13 21:31, Brian Goetz wrote:
>
>>    { API, implementation } x { specification, notes }
>
>>
>> /**
>>   * ... API specifications ...
>>   *
>>   * @apinote
>>   * ... API notes ...
>>   *
>>   * @implspec
>>   * ... implementation specification ...
>>   *
>>   * @implnote
>>   * ... implementation notes ...
>>   *
>>   * @param ...
>>   * @return ...
>>   */
>
> This sounds about right. Even though 90% of future @implspecs will
> probably be for default methods, the need to use workarounds
> for lack of them regularly arises in other cases.
>
> I'm not completely sure about @apinote though.
> For example, something saying that implementations
> may have resource bound/capacity/default (without saying what
> it is), is part of a spec, not just a note, so I
> hope people don't use it as such.
> (Further, while there could then be an @implnote
> saying what that bound/etc value currently is, it is not
> always a great idea to do it when nothing else depends
> on the choice.  Even saying what it is sometimes invites future
> problems when you need to change it.)
>
> Similarly for some performance-related issues. For example
> TreeMap should say as part of its spec that any implementation
> must have expected/amortized O(log n) get and put operations.
> It currently goes further and says that the implementation is
> based on red-black trees, but that should probably be in
> an @implnote. If we take these new categories
> seriously, we'll want to do a pass through most java.util
> (and related JDK) javadocs to carry this out consistently.
>
> And so on. So the only remaining role of @apinote is
> for misc rationales, warnings about potential future
> changes, and things like that. Which usually textually
> flow better in javadoc if just made part of the description.
> So I don't see myself using it much if ever. But since I
> can imagine uses here and there, I guess I have nothing against
> it.
>
>>
>> Secondary: one can build on this to eliminate some common inheritance
>> anomalies
>> by making these inheritable separately, where @inheritDoc is
>> interpreted as
>> "inherit the stuff from the corresponding section."  This is backward
>> compatible
>> because these sections do not yet exist in old docs.  SO to inherit
>> API spec and
>> implementation spec, you would do:
>>
>>   /**
>>    * {@inheritDoc}
>>    * @implspec
>>    * {@inheritDoc}
>>    * ...
>>    */
>>
>
> Yes. We've had to do huge amounts of copy/past/hack of
> javadocs especially in java.util.concurrent to work around this
> problem.
>
> So, all-in-all: Yes, please do this.
>
> -Doug
>

From dl at cs.oswego.edu  Sun Feb  3 07:40:48 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sun, 03 Feb 2013 10:40:48 -0500
Subject: Javadoc conventions in the presence of default methods
In-Reply-To: <510E7FA0.8060309@oracle.com>
References: <510B2903.4070000@oracle.com> <510E7C12.7020100@cs.oswego.edu>
	<510E7FA0.8060309@oracle.com>
Message-ID: <510E8500.90907@cs.oswego.edu>

On 02/03/13 10:17, Brian Goetz wrote:

> Your other notes amount to "people could get it wrong."  Which is true, though
> there's plenty of room to get it wrong with the current scheme too.  I have a
> hard time believing we'll make it worse.

Absolutely. I don't mean to imply anything other than that this
is a big improvement.

-Doug


From brian.goetz at oracle.com  Mon Feb  4 12:37:11 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 04 Feb 2013 15:37:11 -0500
Subject: explode (was: Stream method survey responses)
In-Reply-To: <5101706E.3030601@oracle.com>
References: <5101706E.3030601@oracle.com>
Message-ID: <51101BF7.9070409@oracle.com>

>  From this, here's what I think is left to do:
>   - More work on explode needed
 >   ...

Circling back to this.  Clearly explode() is not done.  Let me try and 
capture all the relevant info in one place.

Let's start with some background.  Why do we want this method at all? 
Well, it's really useful!  It's fairly common to do things like:

   Stream<Order> orders = ...
   Stream<LineItem> lineItems   // explicit declaration for clarity
      = orders.explode(... order.getLineItems() ...)

and it is often desirable to do then streamy things on the stream of 
line items.  Those who have used flatMap in Scala get used to having it 
quite quickly, and would be very sad if it were taken away.  Ask Don how 
many examples in his katas use it.  (Doug will also point out that if 
you have flatMap, you don't really need map or filter -- see examples in 
CHM -- since both can be layered atop flatMap, modulo performance 
concerns.)

(It does have the potential to put some stress on the system when an 
element can be mapped to very large collections, because it sucks you 
into the problem of nested parallelism.  (This is the inverse of another 
problem we already have, which is when filter stages have very high 
selectivity, and we end up with a lot of splitting overhead for the 
number of elements that end up at the tail of the pipeline.)  But when 
mapping an element to a small number of other elements, as is common in 
a lot of use cases, there is generally no problem here.)

Scala has a method flatMap on whatever they'd call Stream<T>, which 
takes a function

   T -> Stream<U>

and produces a Stream<U>.  More generally, this shape of flatMap applies 
(and is supported by higher-kinded generics) to many traits in the Scala 
library, where you have a Foo[A] and the flatMap method takes an A -> 
Foo[B] and produces a Foo[B].  (Our generics can't capture this.)

This is the shape for flatMap that everyone really wants.  But here, we 
run into unfortunate reality: this works great in functional languages 
with cheap structural types, but that's not Java.

I took it as a key design goal for flatMap/mapMulti/explode that: if an 
element t maps to nothing, the implementation should do as close to zero 
work as possible.

This rules out shaping flatMap as:

   <U> Stream<U> flatMap(Function<T, Collection<U>>)

because, if you don't already have the collection lying around, the 
lambdas for this are nasty to write (try writing one, and you'll see 
what I mean), inefficient to execute, and create work for the library to 
iterate the result.  In the limit, where t maps to the empty collection, 
creating an iterating an empty collection for each element is nuts.  (By 
contrast, in languages like Haskell, wrapping elements with lists is 
very cheap.)

However, the above shape is desirable as a convenience in the case you 
already do have a collection lying around.  So let's put it in the bin 
of "nice conveniences to also deliver when we solve the main problem."

It also rules out shaping flatMap as:

   <U> Stream<U> flatMap(Function<T, Stream<U>>)

because that's even worse -- creating ad-hoc streams is even more 
expensive than creating collections.

To simplify, imagine there are two use cases we have to satisfy:

   - map element to generator (general case)
   - map element to collection (convenience case)

The other cases (to array, to stream) are similar enough to the 
collection case.

To illustrate the general "generator" case, here's an example of a 
lambda (using the current API) that takes a Stream of String and 
produces a Stream of Integer values which are the characters of that stream:

(sink, element) -> {
     for (int i=0; i<element.length(); i++)
         sink.send(element.charAt(i));
}

It's efficient (no input, no output) and pretty easy to see what's going 
on.  Would be nicer if we could spell "sink.send" as "yield", but oh 
well.  Here's how we'd have to write that if we didn't support the 
generator case:

(element) -> {
     ArrayList<Integer> list = new ArrayList<>();
     for (int i=0; i<element.length(); i++)
         list.add(element.charAt(i));
     return list;
}

Bigger and less efficient.  And it gets uglier if we want to try and 
optimize away the list creation in the empty case:

(element) -> {
     if (element.length() == 0)
         return Collections.emptyList();
     ArrayList<Integer> list = new ArrayList<>();
     for (int i=0; i<element.length(); i++)
         list.add(element.charAt(i));
     return list;
}

We're really starting to lose sight of what this lambda does. 
(Hopefully this will put to bed the notion that all we need is the 
T->Collection<U> case.)

Erasure plays a role here too.  Ideally, it would be nice to overload 
methods for

   flatMap(Function<T, Collection<U>>)
   flatMap(Function<T, U[])

but obviously we can't do that (directly).


The original API had only:

     <U> Stream<U> flatMap(MultiFunction<T,U> mf)

where MultiFunction was (T, Consumer<U>) -> void.  If users already had 
a Collection lying around, they had to iterate it themselves:

(element, sink) -> {
                      for (U u : findCollection(t))
                          sink.accept(u);
                    }

which isn't terrible but people didn't like it -- I think not because it 
was hard to read, but hard to figure out how to use flatMap at all.

The current iteration provides a helper class with helper methods for 
handling collections, arrays, and streams, but you still have to wrap 
your head around why you're being passed two things before doing 
anything -- and  I think its the "before doing anything" part that 
really messes people up.


So, here's two alternatives that I hope may be better (and not run into 
problems with type inference).

Alternative A: overloading on method names.

     // Map T -> Collection<U>
     public<U> StreamA<U> explodeToCollection(Function<T, Collection<U>> 
mapper);

     // Map T -> U[]
     public<U> StreamA<U> explodeToArray(Function<T, U[]> mapper);

     // Generator case -- pass a T and a Consumer
     public<U> StreamA<U> explodeToConsumer(BiConsumer<T, Consumer<U>> 
mapper);

     // Alternate version of generator case -- with named SAM instead
     public<U> StreamA<U> altExplodeToConsumer(Exploder<T, U> mapper);

     interface Exploder<T, U> {
         void explode(T element, Consumer<U> consumer);
     }

Here, we have various explodeToXxx methods (naming is purely 
illustrative) that defeat the erasure problem.  Users seeking the 
T->Collection version can use the appropriate versions with no problem. 
  When said users discover that their performance sucks, they have 
motivation to learn to use the more efficient generator version.

Usage examples:

     StreamA<Integer> a1
         = a.explodeToArray(i -> new Integer[] { i });
     StreamA<Integer> a2
         = a.explodeToCollection(i -> Collections.singleton(i));
     StreamA<Integer> a3
         = a.explodeToConsumer((i, sink) -> sink.accept(i));


Alternative B: overload on SAMs.  This involves three SAMs:

     interface Exploder<T, U> {
         void explode(T element, Consumer<U> consumer);
     }

     interface MapperToCollection<T, U>
         extends Function<T, Collection<U>> { }

     interface MapperToArray<T, U> extends Function<T, U[]> { }

And three overloaded explode() methods:

     public<U> StreamB<U> explode(MapperToCollection<T, U> exploder);

     public<U> StreamB<U> explode(MapperToArray<T, U> exploder);

     public<U> StreamB<U> explode(Exploder<T, U> exploder);

Usage examples:

     StreamB<Integer> b1 = b.explode(i -> new Integer[] { i });
     StreamB<Integer> b2 = b.explode(i -> Collections.singleton(i));
     StreamB<Integer> b3 = b.explode((i, sink) -> sink.accept(i));


I think the second approach is pretty decent.  Users can easily 
understand the first two versions and use them while wrapping their head 
around the third.


From kevinb at google.com  Mon Feb  4 12:52:17 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Mon, 4 Feb 2013 12:52:17 -0800
Subject: explode (was: Stream method survey responses)
In-Reply-To: <51101BF7.9070409@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
Message-ID: <CAGKkBksY0HBPhtYgGHED34e5VtrjbuAo=pJ467y9_a=o3DUuOw@mail.gmail.com>

Only a quick question first:

On Mon, Feb 4, 2013 at 12:37 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

The original API had only:
>
>
>     <U> Stream<U> flatMap(MultiFunction<T,U> mf)
>
> where MultiFunction was (T, Consumer<U>) -> void.  If users already had a
> Collection lying around, they had to iterate it themselves:
>
> (element, sink) -> {
>                      for (U u : findCollection(t))
>                          sink.accept(u);
>                    }
>

Could that simply be (t, sink) -> findCollection(t).forEach(sink) ?


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Mon Feb  4 12:53:53 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 04 Feb 2013 15:53:53 -0500
Subject: explode
In-Reply-To: <CAGKkBksY0HBPhtYgGHED34e5VtrjbuAo=pJ467y9_a=o3DUuOw@mail.gmail.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<CAGKkBksY0HBPhtYgGHED34e5VtrjbuAo=pJ467y9_a=o3DUuOw@mail.gmail.com>
Message-ID: <51101FE1.7020108@oracle.com>

>     (element, sink) -> {
>                           for (U u : findCollection(t))
>                               sink.accept(u);
>                         }
>
> Could that simply be (t, sink) -> findCollection(t).forEach(sink) ?

Yes.


From brian.goetz at oracle.com  Mon Feb  4 12:58:35 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 04 Feb 2013 15:58:35 -0500
Subject: explode
In-Reply-To: <51101BF7.9070409@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
Message-ID: <511020FB.5000509@oracle.com>

> Alternative A: overloading on method names.
> Alternative B: overload on SAMs.  This involves three SAMs:

To contrast these:
  - A has uglier method names, but fewer new types

  - B has prettier method names (and therefore prettier use site usage), 
but introduces more new ancillary types and puts more stress on type 
inference.

Specifically, I am wondering how we're going to represent "explode Foo 
to ints" -- which is probably an important use case.


From brian.goetz at oracle.com  Tue Feb  5 08:53:42 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 05 Feb 2013 11:53:42 -0500
Subject: Collectors update
In-Reply-To: <51085788.4080705@univ-mlv.fr>
References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr>
Message-ID: <51113916.1010809@oracle.com>

>> 4.  Rejigger Partition to return an array again, with an explicit
>> lambda (which will likely be an array ctor ref) to make the array.
>> Eliminated the silly Partition class.
>
> Please don't do that, it's pure evil.
>    public static<T> Collector<T, Collection<T>[]>
> partitioningBy(Predicate<T> predicate, IntFunction<Collection<T>[]>
> arraySupplier) {

I've refactored this to make the partition collectors return 
Map<Boolean, X>.

From forax at univ-mlv.fr  Tue Feb  5 11:11:31 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 05 Feb 2013 20:11:31 +0100
Subject: Collectors update
In-Reply-To: <51113916.1010809@oracle.com>
References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr>
	<51113916.1010809@oracle.com>
Message-ID: <51115963.2060209@univ-mlv.fr>

On 02/05/2013 05:53 PM, Brian Goetz wrote:
>>> 4.  Rejigger Partition to return an array again, with an explicit
>>> lambda (which will likely be an array ctor ref) to make the array.
>>> Eliminated the silly Partition class.
>>
>> Please don't do that, it's pure evil.
>>    public static<T> Collector<T, Collection<T>[]>
>> partitioningBy(Predicate<T> predicate, IntFunction<Collection<T>[]>
>> arraySupplier) {
>
> I've refactored this to make the partition collectors return 
> Map<Boolean, X>.

I think returning a boolean -> T (or Boolean -> T) is better because 
it's conceptually more lightweight than a Map.
I expect to see more function instead of a Map returned as result of a 
method.

Otherwise, like any other Map returned by the JDK, it should be 
serializable.

R?mi


From kevinb at google.com  Tue Feb  5 12:20:33 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Tue, 5 Feb 2013 12:20:33 -0800
Subject: Collectors update
In-Reply-To: <51115963.2060209@univ-mlv.fr>
References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr>
	<51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr>
Message-ID: <CAGKkBkvi9rWhMMdsOc-uhTeLQDqfty2wB1FDh95SxtH4WitgDg@mail.gmail.com>

On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax <forax at univ-mlv.fr> wrote:

 4.  Rejigger Partition to return an array again, with an explicit
>>>> lambda (which will likely be an array ctor ref) to make the array.
>>>> Eliminated the silly Partition class.
>>>>
>>>
>>> Please don't do that, it's pure evil.
>>>    public static<T> Collector<T, Collection<T>[]>
>>> partitioningBy(Predicate<T> predicate, IntFunction<Collection<T>[]>
>>> arraySupplier) {
>>>
>>
>> I've refactored this to make the partition collectors return Map<Boolean,
>> X>.
>>
>
> I think returning a boolean -> T (or Boolean -> T) is better because it's
> conceptually more lightweight than a Map.
> I expect to see more function instead of a Map returned as result of a
> method.
>

I'd have to disagree; I expect function objects to be little things I pass *
in*, but I think it's more intuitive to expect a proper data structure back
out.


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Tue Feb  5 12:22:02 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 05 Feb 2013 15:22:02 -0500
Subject: Collectors update
In-Reply-To: <CAGKkBkvi9rWhMMdsOc-uhTeLQDqfty2wB1FDh95SxtH4WitgDg@mail.gmail.com>
References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr>
	<51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr>
	<CAGKkBkvi9rWhMMdsOc-uhTeLQDqfty2wB1FDh95SxtH4WitgDg@mail.gmail.com>
Message-ID: <511169EA.1030109@oracle.com>

I concur with Kevin.

On 2/5/2013 3:20 PM, Kevin Bourrillion wrote:
> On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax <forax at univ-mlv.fr
> <mailto:forax at univ-mlv.fr>> wrote:
>
>                 4.  Rejigger Partition to return an array again, with an
>                 explicit
>                 lambda (which will likely be an array ctor ref) to make
>                 the array.
>                 Eliminated the silly Partition class.
>
>
>             Please don't do that, it's pure evil.
>                 public static<T> Collector<T, Collection<T>[]>
>             partitioningBy(Predicate<T> predicate,
>             IntFunction<Collection<T>[]>
>             arraySupplier) {
>
>
>         I've refactored this to make the partition collectors return
>         Map<Boolean, X>.
>
>
>     I think returning a boolean -> T (or Boolean -> T) is better because
>     it's conceptually more lightweight than a Map.
>     I expect to see more function instead of a Map returned as result of
>     a method.
>
>
> I'd have to disagree; I expect function objects to be little things I
> pass /in/, but I think it's more intuitive to expect a proper data
> structure back out.
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>

From forax at univ-mlv.fr  Tue Feb  5 12:46:57 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 05 Feb 2013 21:46:57 +0100
Subject: Collectors update
In-Reply-To: <511169EA.1030109@oracle.com>
References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr>
	<51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr>
	<CAGKkBkvi9rWhMMdsOc-uhTeLQDqfty2wB1FDh95SxtH4WitgDg@mail.gmail.com>
	<511169EA.1030109@oracle.com>
Message-ID: <51116FC1.1030206@univ-mlv.fr>

On 02/05/2013 09:22 PM, Brian Goetz wrote:
> I concur with Kevin.

We should remove Consumer.chain() in that case.

R?mi

>
> On 2/5/2013 3:20 PM, Kevin Bourrillion wrote:
>> On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax <forax at univ-mlv.fr
>> <mailto:forax at univ-mlv.fr>> wrote:
>>
>>                 4.  Rejigger Partition to return an array again, with an
>>                 explicit
>>                 lambda (which will likely be an array ctor ref) to make
>>                 the array.
>>                 Eliminated the silly Partition class.
>>
>>
>>             Please don't do that, it's pure evil.
>>                 public static<T> Collector<T, Collection<T>[]>
>>             partitioningBy(Predicate<T> predicate,
>>             IntFunction<Collection<T>[]>
>>             arraySupplier) {
>>
>>
>>         I've refactored this to make the partition collectors return
>>         Map<Boolean, X>.
>>
>>
>>     I think returning a boolean -> T (or Boolean -> T) is better because
>>     it's conceptually more lightweight than a Map.
>>     I expect to see more function instead of a Map returned as result of
>>     a method.
>>
>>
>> I'd have to disagree; I expect function objects to be little things I
>> pass /in/, but I think it's more intuitive to expect a proper data
>> structure back out.
>>
>>
>> -- 
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>


From brian.goetz at oracle.com  Tue Feb  5 12:54:01 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 05 Feb 2013 15:54:01 -0500
Subject: Collectors update
In-Reply-To: <51116FC1.1030206@univ-mlv.fr>
References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr>
	<51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr>
	<CAGKkBkvi9rWhMMdsOc-uhTeLQDqfty2wB1FDh95SxtH4WitgDg@mail.gmail.com>
	<511169EA.1030109@oracle.com> <51116FC1.1030206@univ-mlv.fr>
Message-ID: <51117169.3090605@oracle.com>

That's silly.  We didn't say anything about "nothing should return a 
function."

Kevin is completely right that collect() is a data-oriented operation 
and should return a real data structure.

Consumer.chain() is a higher-order function; functions in, functions out 
-- no data involved.

On 2/5/2013 3:46 PM, Remi Forax wrote:
> On 02/05/2013 09:22 PM, Brian Goetz wrote:
>> I concur with Kevin.
>
> We should remove Consumer.chain() in that case.
>
> R?mi
>
>>
>> On 2/5/2013 3:20 PM, Kevin Bourrillion wrote:
>>> On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax <forax at univ-mlv.fr
>>> <mailto:forax at univ-mlv.fr>> wrote:
>>>
>>>                 4.  Rejigger Partition to return an array again, with an
>>>                 explicit
>>>                 lambda (which will likely be an array ctor ref) to make
>>>                 the array.
>>>                 Eliminated the silly Partition class.
>>>
>>>
>>>             Please don't do that, it's pure evil.
>>>                 public static<T> Collector<T, Collection<T>[]>
>>>             partitioningBy(Predicate<T> predicate,
>>>             IntFunction<Collection<T>[]>
>>>             arraySupplier) {
>>>
>>>
>>>         I've refactored this to make the partition collectors return
>>>         Map<Boolean, X>.
>>>
>>>
>>>     I think returning a boolean -> T (or Boolean -> T) is better because
>>>     it's conceptually more lightweight than a Map.
>>>     I expect to see more function instead of a Map returned as result of
>>>     a method.
>>>
>>>
>>> I'd have to disagree; I expect function objects to be little things I
>>> pass /in/, but I think it's more intuitive to expect a proper data
>>> structure back out.
>>>
>>>
>>> --
>>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>>> <mailto:kevinb at google.com>
>

From forax at univ-mlv.fr  Tue Feb  5 13:38:13 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 05 Feb 2013 22:38:13 +0100
Subject: Collectors update
In-Reply-To: <51117169.3090605@oracle.com>
References: <51084E8C.2060403@oracle.com> <51085788.4080705@univ-mlv.fr>
	<51113916.1010809@oracle.com> <51115963.2060209@univ-mlv.fr>
	<CAGKkBkvi9rWhMMdsOc-uhTeLQDqfty2wB1FDh95SxtH4WitgDg@mail.gmail.com>
	<511169EA.1030109@oracle.com> <51116FC1.1030206@univ-mlv.fr>
	<51117169.3090605@oracle.com>
Message-ID: <51117BC5.8000501@univ-mlv.fr>

On 02/05/2013 09:54 PM, Brian Goetz wrote:
> That's silly.  We didn't say anything about "nothing should return a 
> function."
>
> Kevin is completely right that collect() is a data-oriented operation 
> and should return a real data structure.
>
> Consumer.chain() is a higher-order function; functions in, functions 
> out -- no data involved.

Why using Map is a "real data structure" ?
I think I prefer to have a real type like Partition (as Don said).

R?mi

>
> On 2/5/2013 3:46 PM, Remi Forax wrote:
>> On 02/05/2013 09:22 PM, Brian Goetz wrote:
>>> I concur with Kevin.
>>
>> We should remove Consumer.chain() in that case.
>>
>> R?mi
>>
>>>
>>> On 2/5/2013 3:20 PM, Kevin Bourrillion wrote:
>>>> On Tue, Feb 5, 2013 at 11:11 AM, Remi Forax <forax at univ-mlv.fr
>>>> <mailto:forax at univ-mlv.fr>> wrote:
>>>>
>>>>                 4.  Rejigger Partition to return an array again, 
>>>> with an
>>>>                 explicit
>>>>                 lambda (which will likely be an array ctor ref) to 
>>>> make
>>>>                 the array.
>>>>                 Eliminated the silly Partition class.
>>>>
>>>>
>>>>             Please don't do that, it's pure evil.
>>>>                 public static<T> Collector<T, Collection<T>[]>
>>>>             partitioningBy(Predicate<T> predicate,
>>>>             IntFunction<Collection<T>[]>
>>>>             arraySupplier) {
>>>>
>>>>
>>>>         I've refactored this to make the partition collectors return
>>>>         Map<Boolean, X>.
>>>>
>>>>
>>>>     I think returning a boolean -> T (or Boolean -> T) is better 
>>>> because
>>>>     it's conceptually more lightweight than a Map.
>>>>     I expect to see more function instead of a Map returned as 
>>>> result of
>>>>     a method.
>>>>
>>>>
>>>> I'd have to disagree; I expect function objects to be little things I
>>>> pass /in/, but I think it's more intuitive to expect a proper data
>>>> structure back out.
>>>>
>>>>
>>>> -- 
>>>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>>>> <mailto:kevinb at google.com>
>>


From brian.goetz at oracle.com  Wed Feb  6 14:12:15 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 06 Feb 2013 17:12:15 -0500
Subject: Collectors update redux
Message-ID: <5112D53F.2080205@oracle.com>

Did more tweaking with Collectors.

Recall there are two basic forms of the collect method:

The most basic one is the "on ramp", which doesn't require any 
understanding of Collector or the combinators therein; it is basically 
the mutable version of reduce.  It looks like:

   collect(() -> R, (R,T) -> void, (R,R) -> void)

The API shape is defined so that most invocations will work with method 
references:

   // To ArrayList
   collect(ArrayList::new, ArrayList::add, ArrayList::addAll)

Note that this works in parallel too; we create list at the leaves with 
::add, and merge them up the tree with ::addAll.

   // String concat
   collect(StringBuilder::new, StringBuilder::append,
           StringBuilder::append)

   // Turn an int stream to a BitSet with those bits set
   collect(BitSet::new, BitSet::set, BitSet::or)

   // String join with delimiter
   collect(() -> new StringJoiner(", "), StringJoiner::append,
           StringJoiner::append)

Again, all these work in parallel.

Digression: the various forms of reduce/etc form a ladder in terms of 
complexity:

   If you understand reduction, you can understand...
   ...reduce(T, BinaryOperator<T>)

   If you understand the above + Optional, you can then understand...
   ...reduce(BinaryOperator<T>)

   If you understand the above + "fold" (nonhomogeneous reduction), you 
can then understand...
     ...reduce(U, BiFunction<U, T, U> accumulator, BinaryOperator<U>);

   If you understand the above + "mutable fold" (inject), you can then 
understand...
     ...collect(Supplier<R>, (R,T) -> void, (R,R) -> void)

   If you understand the above + "Collector", you can then understand...
     ...collect(Collector)

This is all supported by the principle of commensurate effort; learn a 
little more, can do a little more.

OK, exiting digression, moving to the end of the list, those that use 
"canned" Collectors.

   collect(Collector)
   collectUnordered(Collector)

Collectors are basically a tuple of three lambdas and a boolean 
indicating whether the Collector can handle concurrent insertion:

   Collector<T,R> = { () -> R, (R,T) -> void, (R,R) -> R, isConcurrent }

Note there is a slight difference in the last argument, a 
BinaryOperator<R> rather than a BiConsumer<R,R>.  The BinaryOperator 
form is more flexible (it can support appending two Lists into a tree 
representation without copying the elements, whereas the (R,R) -> void 
form can't.)  This asymmetry is a rough edge, though in each case, the 
shape is "locally" optimal (in the three-arg version, the void form 
supports method refs better; in the Collector version, the result is 
more flexible, and that's where we need the flexibility.)  But we could 
make them consistent at the cost of the above uses becoming more like:

   collect(StringBuilder::new, StringBuilder::append,
           (l, r) -> { l.append(r); return l; })

Overall I think the current API yields better client code at the cost of 
this slightly rough edge.


The set of Collectors now includes:
   toCollection(Supplier<Collection>)
   toList()
   toSet()
   toStringBuilder()
   toStringJoiner(delimiter)

   // mapping combinators (plus primitive specializations)
   mapping(T->U, Collector<U>)

   // Single-level groupBy
   groupingBy(T->K)

   // groupBy with downstream Collector)
   groupingBy(T->K, Collector<T>)

   // grouping plus reduce
   groupingReduce(T->K, BinaryOperator<T>) // reduce only
   groupingReduce(T->K, T->U, BinaryOperator<U>) // map+reduce

   // join (nee mappedTo)
   joiningWith(T -> U) // produces Map<T,U>

   // partition
   partitioningBy(Predicate<T>)
   partitioningBy(Predicate<T>, Collector<T>)
   partitioningReduce(Predicate<T>, BinaryOperator<T>)
   partitioningReduce(Predicate<T>, T->U, BinaryOperator<T>)

   // statistics (gathers sum, count, min, max, average)
   toLongStatistics()
   toDoubleStatistics()

Plus, concurrent versions of most of these (which are suitable for 
unordered/contended/forEach-style execution.)  Plus versions that let 
you offer explicit constructors for maps and collections.  While these 
may seem like a lot, the implementations are highly compact -- all of 
these together, plus supporting machinery, fit in 500 LoC.

These Collectors are designed around composibility.  (It is vaguely 
frustrating that we even have to separate the "with downstream 
Collector" versions from the reducing versions.)  So they each have a 
form where you can do some level of categorization and then use a 
downstream collector to do further computation.  This is very powerful.

Examples, again using the familiar problem domain of transactions:

class Txn {
     Buyer buyer();
     Seller seller();
     String description();
     int amount();
}

Transactions by buyer:

   Map<Buyer, Collection<Txn>>
     m = txns.collect(groupingBy(Txn::buyer));

Highest-dollar transaction by buyer:
   Map<Buyer, Transaction>
     m = txns.collect(
         groupingReduce(Txn::buyer,
                        Comparators.greaterOf(
                            Comparators.comparing(Txn::amount)));

Here, comparing() takes the Txn -> amount function, and produces a 
Comparator<Txn>; greaterOf(comparator) turns that Comparator into a 
BinaryOperator that corresponds to "max by comparator".  We then reduce 
on that, yielding highest-dollar transaction per buyer.

Alternately, if you want the number, not the transaction:
   Map<Buyer, Integer>
     m = txns.collect(groupingReduce(Txn::buyer,
                                     Txn::amount, Integer::max));

Transactions by buyer, seller:
   Map<Buyer, Map<Seller, Collection<Txn>>>
     m = txns.collect(groupingBy(Txn::buyer, groupingBy(Txn::seller)));

Transaction volume statistics by buyer, seller:

   Map<Buyer, Map<Seller, LongStatistics>>
     m = txns.collect(groupingBy(Txn::buyer,
                          groupingBy(Txn::seller,
                                     mapping(Txn::amount,
                                             toLongStatistics())));

The statistics let you get at min, max, sum, count, and average from a 
single pass on the data (this trick taken from ParallelArray.)

We can mix and match at various levels.  For example:

Transactions by buyer, partitioned int "large/small" groups:

   Predicate<Txn> isLarge = t -> t.amount() > BIG;
   Map<Buyer, Map<Boolean, Collection<Transaction>>>
     m = txns.collect(groupingBy(Txn::buyer, partitioningBy(isLarge)));

Or, turning it around:

   Map<Boolean, Map<Buyer, Collection<Transaction>>>
     m = txns.collect(partitioningBy(isLarge, groupingBy(Txn::buyer)));

Because Collector is public, Kevin can write and publish 
Guava-multimap-bearing versions of these -- probably in about ten minutes.


From brian.goetz at oracle.com  Wed Feb  6 14:32:28 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 06 Feb 2013 17:32:28 -0500
Subject: explode
In-Reply-To: <51101BF7.9070409@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
Message-ID: <5112D9FC.9010707@oracle.com>

Guys, we need to close on the open Stream API items relatively soon. 
Maybe we're almost there on flatMap.

Of the alternatives for flatMap below, I think while Alternative B is 
attractive from a client code perspective, I think Alternative A is less 
risky with respect to stressing the compiler (and also introduces fewer 
new types.)

So, semi-concrete proposal:

   Stream<U> flatMapToCollection(Function<T, Collection<U>>)
   Stream<U> flatMapToArray(Function<T, U[]>) // do we even need this?
   Stream<U> flatMap(Function<T, Stream<U>>)
   Stream<U> flatMap(FlatMapper<T, U>)

where

   interface FlatMapper<T, U> {
      void explodeInto(T t, Consumer<U> consumer);
   }

with specializations for primitives:

   IntStream flatMap(FlatMapper.OfInt<T>)
   ... etc

We can then position flatMap as the "advanced" version, so from a 
"graduated learning" perspective, people will find fMTC first, if that 
meets their needs, great, and the Javadoc for fMTC can guide them to fM 
for the more advanced cases.


On 2/4/2013 3:37 PM, Brian Goetz wrote:
>>  From this, here's what I think is left to do:
>>   - More work on explode needed
>  >   ...
>
> Circling back to this.  Clearly explode() is not done.  Let me try and
> capture all the relevant info in one place.
>
> Let's start with some background.  Why do we want this method at all?
> Well, it's really useful!  It's fairly common to do things like:
>
>    Stream<Order> orders = ...
>    Stream<LineItem> lineItems   // explicit declaration for clarity
>       = orders.explode(... order.getLineItems() ...)
>
> and it is often desirable to do then streamy things on the stream of
> line items.  Those who have used flatMap in Scala get used to having it
> quite quickly, and would be very sad if it were taken away.  Ask Don how
> many examples in his katas use it.  (Doug will also point out that if
> you have flatMap, you don't really need map or filter -- see examples in
> CHM -- since both can be layered atop flatMap, modulo performance
> concerns.)
>
> (It does have the potential to put some stress on the system when an
> element can be mapped to very large collections, because it sucks you
> into the problem of nested parallelism.  (This is the inverse of another
> problem we already have, which is when filter stages have very high
> selectivity, and we end up with a lot of splitting overhead for the
> number of elements that end up at the tail of the pipeline.)  But when
> mapping an element to a small number of other elements, as is common in
> a lot of use cases, there is generally no problem here.)
>
> Scala has a method flatMap on whatever they'd call Stream<T>, which
> takes a function
>
>    T -> Stream<U>
>
> and produces a Stream<U>.  More generally, this shape of flatMap applies
> (and is supported by higher-kinded generics) to many traits in the Scala
> library, where you have a Foo[A] and the flatMap method takes an A ->
> Foo[B] and produces a Foo[B].  (Our generics can't capture this.)
>
> This is the shape for flatMap that everyone really wants.  But here, we
> run into unfortunate reality: this works great in functional languages
> with cheap structural types, but that's not Java.
>
> I took it as a key design goal for flatMap/mapMulti/explode that: if an
> element t maps to nothing, the implementation should do as close to zero
> work as possible.
>
> This rules out shaping flatMap as:
>
>    <U> Stream<U> flatMap(Function<T, Collection<U>>)
>
> because, if you don't already have the collection lying around, the
> lambdas for this are nasty to write (try writing one, and you'll see
> what I mean), inefficient to execute, and create work for the library to
> iterate the result.  In the limit, where t maps to the empty collection,
> creating an iterating an empty collection for each element is nuts.  (By
> contrast, in languages like Haskell, wrapping elements with lists is
> very cheap.)
>
> However, the above shape is desirable as a convenience in the case you
> already do have a collection lying around.  So let's put it in the bin
> of "nice conveniences to also deliver when we solve the main problem."
>
> It also rules out shaping flatMap as:
>
>    <U> Stream<U> flatMap(Function<T, Stream<U>>)
>
> because that's even worse -- creating ad-hoc streams is even more
> expensive than creating collections.
>
> To simplify, imagine there are two use cases we have to satisfy:
>
>    - map element to generator (general case)
>    - map element to collection (convenience case)
>
> The other cases (to array, to stream) are similar enough to the
> collection case.
>
> To illustrate the general "generator" case, here's an example of a
> lambda (using the current API) that takes a Stream of String and
> produces a Stream of Integer values which are the characters of that
> stream:
>
> (sink, element) -> {
>      for (int i=0; i<element.length(); i++)
>          sink.send(element.charAt(i));
> }
>
> It's efficient (no input, no output) and pretty easy to see what's going
> on.  Would be nicer if we could spell "sink.send" as "yield", but oh
> well.  Here's how we'd have to write that if we didn't support the
> generator case:
>
> (element) -> {
>      ArrayList<Integer> list = new ArrayList<>();
>      for (int i=0; i<element.length(); i++)
>          list.add(element.charAt(i));
>      return list;
> }
>
> Bigger and less efficient.  And it gets uglier if we want to try and
> optimize away the list creation in the empty case:
>
> (element) -> {
>      if (element.length() == 0)
>          return Collections.emptyList();
>      ArrayList<Integer> list = new ArrayList<>();
>      for (int i=0; i<element.length(); i++)
>          list.add(element.charAt(i));
>      return list;
> }
>
> We're really starting to lose sight of what this lambda does. (Hopefully
> this will put to bed the notion that all we need is the T->Collection<U>
> case.)
>
> Erasure plays a role here too.  Ideally, it would be nice to overload
> methods for
>
>    flatMap(Function<T, Collection<U>>)
>    flatMap(Function<T, U[])
>
> but obviously we can't do that (directly).
>
>
> The original API had only:
>
>      <U> Stream<U> flatMap(MultiFunction<T,U> mf)
>
> where MultiFunction was (T, Consumer<U>) -> void.  If users already had
> a Collection lying around, they had to iterate it themselves:
>
> (element, sink) -> {
>                       for (U u : findCollection(t))
>                           sink.accept(u);
>                     }
>
> which isn't terrible but people didn't like it -- I think not because it
> was hard to read, but hard to figure out how to use flatMap at all.
>
> The current iteration provides a helper class with helper methods for
> handling collections, arrays, and streams, but you still have to wrap
> your head around why you're being passed two things before doing
> anything -- and  I think its the "before doing anything" part that
> really messes people up.
>
>
> So, here's two alternatives that I hope may be better (and not run into
> problems with type inference).
>
> Alternative A: overloading on method names.
>
>      // Map T -> Collection<U>
>      public<U> StreamA<U> explodeToCollection(Function<T, Collection<U>>
> mapper);
>
>      // Map T -> U[]
>      public<U> StreamA<U> explodeToArray(Function<T, U[]> mapper);
>
>      // Generator case -- pass a T and a Consumer
>      public<U> StreamA<U> explodeToConsumer(BiConsumer<T, Consumer<U>>
> mapper);
>
>      // Alternate version of generator case -- with named SAM instead
>      public<U> StreamA<U> altExplodeToConsumer(Exploder<T, U> mapper);
>
>      interface Exploder<T, U> {
>          void explode(T element, Consumer<U> consumer);
>      }
>
> Here, we have various explodeToXxx methods (naming is purely
> illustrative) that defeat the erasure problem.  Users seeking the
> T->Collection version can use the appropriate versions with no problem.
>   When said users discover that their performance sucks, they have
> motivation to learn to use the more efficient generator version.
>
> Usage examples:
>
>      StreamA<Integer> a1
>          = a.explodeToArray(i -> new Integer[] { i });
>      StreamA<Integer> a2
>          = a.explodeToCollection(i -> Collections.singleton(i));
>      StreamA<Integer> a3
>          = a.explodeToConsumer((i, sink) -> sink.accept(i));
>
>
> Alternative B: overload on SAMs.  This involves three SAMs:
>
>      interface Exploder<T, U> {
>          void explode(T element, Consumer<U> consumer);
>      }
>
>      interface MapperToCollection<T, U>
>          extends Function<T, Collection<U>> { }
>
>      interface MapperToArray<T, U> extends Function<T, U[]> { }
>
> And three overloaded explode() methods:
>
>      public<U> StreamB<U> explode(MapperToCollection<T, U> exploder);
>
>      public<U> StreamB<U> explode(MapperToArray<T, U> exploder);
>
>      public<U> StreamB<U> explode(Exploder<T, U> exploder);
>
> Usage examples:
>
>      StreamB<Integer> b1 = b.explode(i -> new Integer[] { i });
>      StreamB<Integer> b2 = b.explode(i -> Collections.singleton(i));
>      StreamB<Integer> b3 = b.explode((i, sink) -> sink.accept(i));
>
>
> I think the second approach is pretty decent.  Users can easily
> understand the first two versions and use them while wrapping their head
> around the third.
>
>

From forax at univ-mlv.fr  Wed Feb  6 14:50:24 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 06 Feb 2013 23:50:24 +0100
Subject: explode
In-Reply-To: <5112D9FC.9010707@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com>
Message-ID: <5112DE30.7030704@univ-mlv.fr>

On 02/06/2013 11:32 PM, Brian Goetz wrote:
> Guys, we need to close on the open Stream API items relatively soon. 
> Maybe we're almost there on flatMap.
>
> Of the alternatives for flatMap below, I think while Alternative B is 
> attractive from a client code perspective, I think Alternative A is 
> less risky with respect to stressing the compiler (and also introduces 
> fewer new types.)
>
> So, semi-concrete proposal:
>
>   Stream<U> flatMapToCollection(Function<T, Collection<U>>)
>   Stream<U> flatMapToArray(Function<T, U[]>) // do we even need this?
>   Stream<U> flatMap(Function<T, Stream<U>>)
>   Stream<U> flatMap(FlatMapper<T, U>)

What about consistency ?
You said that we should not use Collection explicitly in the stream API
hence we don't have toList(), toSet(), or groupBy() but 
collect(toList()), collect(toSet()) or  collect(groupingBy)
and at the same time, for flatMap which will be less used, you want to 
add flatMapToCollection, flatMapToArray.

I think you should be at least consistent, so either we have an Exploder 
like we have a Collector,
or we have several overloads for flatMap, groupBy and toList/toSet.

>
> where
>
>   interface FlatMapper<T, U> {
>      void explodeInto(T t, Consumer<U> consumer);
>   }
>
> with specializations for primitives:
>
>   IntStream flatMap(FlatMapper.OfInt<T>)
>   ... etc
>
> We can then position flatMap as the "advanced" version, so from a 
> "graduated learning" perspective, people will find fMTC first, if that 
> meets their needs, great, and the Javadoc for fMTC can guide them to 
> fM for the more advanced cases.

R?mi

>
>
>
> On 2/4/2013 3:37 PM, Brian Goetz wrote:
>>>  From this, here's what I think is left to do:
>>>   - More work on explode needed
>>  >   ...
>>
>> Circling back to this.  Clearly explode() is not done.  Let me try and
>> capture all the relevant info in one place.
>>
>> Let's start with some background.  Why do we want this method at all?
>> Well, it's really useful!  It's fairly common to do things like:
>>
>>    Stream<Order> orders = ...
>>    Stream<LineItem> lineItems   // explicit declaration for clarity
>>       = orders.explode(... order.getLineItems() ...)
>>
>> and it is often desirable to do then streamy things on the stream of
>> line items.  Those who have used flatMap in Scala get used to having it
>> quite quickly, and would be very sad if it were taken away.  Ask Don how
>> many examples in his katas use it.  (Doug will also point out that if
>> you have flatMap, you don't really need map or filter -- see examples in
>> CHM -- since both can be layered atop flatMap, modulo performance
>> concerns.)
>>
>> (It does have the potential to put some stress on the system when an
>> element can be mapped to very large collections, because it sucks you
>> into the problem of nested parallelism.  (This is the inverse of another
>> problem we already have, which is when filter stages have very high
>> selectivity, and we end up with a lot of splitting overhead for the
>> number of elements that end up at the tail of the pipeline.) But when
>> mapping an element to a small number of other elements, as is common in
>> a lot of use cases, there is generally no problem here.)
>>
>> Scala has a method flatMap on whatever they'd call Stream<T>, which
>> takes a function
>>
>>    T -> Stream<U>
>>
>> and produces a Stream<U>.  More generally, this shape of flatMap applies
>> (and is supported by higher-kinded generics) to many traits in the Scala
>> library, where you have a Foo[A] and the flatMap method takes an A ->
>> Foo[B] and produces a Foo[B].  (Our generics can't capture this.)
>>
>> This is the shape for flatMap that everyone really wants.  But here, we
>> run into unfortunate reality: this works great in functional languages
>> with cheap structural types, but that's not Java.
>>
>> I took it as a key design goal for flatMap/mapMulti/explode that: if an
>> element t maps to nothing, the implementation should do as close to zero
>> work as possible.
>>
>> This rules out shaping flatMap as:
>>
>>    <U> Stream<U> flatMap(Function<T, Collection<U>>)
>>
>> because, if you don't already have the collection lying around, the
>> lambdas for this are nasty to write (try writing one, and you'll see
>> what I mean), inefficient to execute, and create work for the library to
>> iterate the result.  In the limit, where t maps to the empty collection,
>> creating an iterating an empty collection for each element is nuts.  (By
>> contrast, in languages like Haskell, wrapping elements with lists is
>> very cheap.)
>>
>> However, the above shape is desirable as a convenience in the case you
>> already do have a collection lying around.  So let's put it in the bin
>> of "nice conveniences to also deliver when we solve the main problem."
>>
>> It also rules out shaping flatMap as:
>>
>>    <U> Stream<U> flatMap(Function<T, Stream<U>>)
>>
>> because that's even worse -- creating ad-hoc streams is even more
>> expensive than creating collections.
>>
>> To simplify, imagine there are two use cases we have to satisfy:
>>
>>    - map element to generator (general case)
>>    - map element to collection (convenience case)
>>
>> The other cases (to array, to stream) are similar enough to the
>> collection case.
>>
>> To illustrate the general "generator" case, here's an example of a
>> lambda (using the current API) that takes a Stream of String and
>> produces a Stream of Integer values which are the characters of that
>> stream:
>>
>> (sink, element) -> {
>>      for (int i=0; i<element.length(); i++)
>>          sink.send(element.charAt(i));
>> }
>>
>> It's efficient (no input, no output) and pretty easy to see what's going
>> on.  Would be nicer if we could spell "sink.send" as "yield", but oh
>> well.  Here's how we'd have to write that if we didn't support the
>> generator case:
>>
>> (element) -> {
>>      ArrayList<Integer> list = new ArrayList<>();
>>      for (int i=0; i<element.length(); i++)
>>          list.add(element.charAt(i));
>>      return list;
>> }
>>
>> Bigger and less efficient.  And it gets uglier if we want to try and
>> optimize away the list creation in the empty case:
>>
>> (element) -> {
>>      if (element.length() == 0)
>>          return Collections.emptyList();
>>      ArrayList<Integer> list = new ArrayList<>();
>>      for (int i=0; i<element.length(); i++)
>>          list.add(element.charAt(i));
>>      return list;
>> }
>>
>> We're really starting to lose sight of what this lambda does. (Hopefully
>> this will put to bed the notion that all we need is the T->Collection<U>
>> case.)
>>
>> Erasure plays a role here too.  Ideally, it would be nice to overload
>> methods for
>>
>>    flatMap(Function<T, Collection<U>>)
>>    flatMap(Function<T, U[])
>>
>> but obviously we can't do that (directly).
>>
>>
>> The original API had only:
>>
>>      <U> Stream<U> flatMap(MultiFunction<T,U> mf)
>>
>> where MultiFunction was (T, Consumer<U>) -> void.  If users already had
>> a Collection lying around, they had to iterate it themselves:
>>
>> (element, sink) -> {
>>                       for (U u : findCollection(t))
>>                           sink.accept(u);
>>                     }
>>
>> which isn't terrible but people didn't like it -- I think not because it
>> was hard to read, but hard to figure out how to use flatMap at all.
>>
>> The current iteration provides a helper class with helper methods for
>> handling collections, arrays, and streams, but you still have to wrap
>> your head around why you're being passed two things before doing
>> anything -- and  I think its the "before doing anything" part that
>> really messes people up.
>>
>>
>> So, here's two alternatives that I hope may be better (and not run into
>> problems with type inference).
>>
>> Alternative A: overloading on method names.
>>
>>      // Map T -> Collection<U>
>>      public<U> StreamA<U> explodeToCollection(Function<T, Collection<U>>
>> mapper);
>>
>>      // Map T -> U[]
>>      public<U> StreamA<U> explodeToArray(Function<T, U[]> mapper);
>>
>>      // Generator case -- pass a T and a Consumer
>>      public<U> StreamA<U> explodeToConsumer(BiConsumer<T, Consumer<U>>
>> mapper);
>>
>>      // Alternate version of generator case -- with named SAM instead
>>      public<U> StreamA<U> altExplodeToConsumer(Exploder<T, U> mapper);
>>
>>      interface Exploder<T, U> {
>>          void explode(T element, Consumer<U> consumer);
>>      }
>>
>> Here, we have various explodeToXxx methods (naming is purely
>> illustrative) that defeat the erasure problem.  Users seeking the
>> T->Collection version can use the appropriate versions with no problem.
>>   When said users discover that their performance sucks, they have
>> motivation to learn to use the more efficient generator version.
>>
>> Usage examples:
>>
>>      StreamA<Integer> a1
>>          = a.explodeToArray(i -> new Integer[] { i });
>>      StreamA<Integer> a2
>>          = a.explodeToCollection(i -> Collections.singleton(i));
>>      StreamA<Integer> a3
>>          = a.explodeToConsumer((i, sink) -> sink.accept(i));
>>
>>
>> Alternative B: overload on SAMs.  This involves three SAMs:
>>
>>      interface Exploder<T, U> {
>>          void explode(T element, Consumer<U> consumer);
>>      }
>>
>>      interface MapperToCollection<T, U>
>>          extends Function<T, Collection<U>> { }
>>
>>      interface MapperToArray<T, U> extends Function<T, U[]> { }
>>
>> And three overloaded explode() methods:
>>
>>      public<U> StreamB<U> explode(MapperToCollection<T, U> exploder);
>>
>>      public<U> StreamB<U> explode(MapperToArray<T, U> exploder);
>>
>>      public<U> StreamB<U> explode(Exploder<T, U> exploder);
>>
>> Usage examples:
>>
>>      StreamB<Integer> b1 = b.explode(i -> new Integer[] { i });
>>      StreamB<Integer> b2 = b.explode(i -> Collections.singleton(i));
>>      StreamB<Integer> b3 = b.explode((i, sink) -> sink.accept(i));
>>
>>
>> I think the second approach is pretty decent.  Users can easily
>> understand the first two versions and use them while wrapping their head
>> around the third.
>>
>>


From brian.goetz at oracle.com  Wed Feb  6 15:30:15 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 06 Feb 2013 18:30:15 -0500
Subject: explode
In-Reply-To: <5112DE30.7030704@univ-mlv.fr>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
Message-ID: <5112E787.5090809@oracle.com>

> You said that we should not use Collection explicitly in the stream API
> hence we don't have toList(), toSet(), or groupBy() but
> collect(toList()), collect(toSet()) or  collect(groupingBy)
> and at the same time, for flatMap which will be less used, you want to
> add flatMapToCollection, flatMapToArray.

Yes, any coupling to Collection is undesirable and has to be justified. 
  We're currently in a nice place (zero uses of Collection in Stream) so 
it would be nice to stay there, and one is a lot worse than zero.

But be careful that you try to turn consistency into a goal unto itself. 
  For example, the use of Collections in Collectors is an ideal 
compromise; the important thing is they are out of the core interface 
which we expect every aggregate for the next 10+ years to implement, but 
are still available for easy use through standalone static helper 
methods like groupingBy.  This is an ideal balance of giving users tools 
to do their job without tying Stream to Collection.

> I think you should be at least consistent, so either we have an Exploder
> like we have a Collector,
> or we have several overloads for flatMap, groupBy and toList/toSet.

Personally, I would (fairly strongly) prefer to have only:

   Stream<U> flatMap(FlatMapper<T, U>)

and

   Stream<U> flatMap(Function<T, Stream<U>>)

One can quite easily derive the Collection (and with slightly more work, 
array) cases from the first form (or the second form, with more runtime 
overhead):

   .flatMap((t, sink) -> getColl(t).forEach(sink))
   .flatMap(t -> getColl(t).stream())

In fact, the first is what we originally had.  But then people howled 
that (a) "I can't understand flatMap" and (b) "I think flatMap should 
take a Function<T, Collection<U>>".  In our early focus groups, people 
saw the base form of FlatMap and universally cried "WTF?"  People can't 
understand it.  After 100 people make the same comment, you start to get 
that its a pain point.

So, the proposal I made today attempts to take into account that people 
are not yet ready to understand this form of flatMap, and attempts to 
compromise.  But I'll happily retreat from that, and vote for just

   Stream<U> flatMap(FlatMapper<T, U>)
   Stream<U> flatMap(Function<T, Stream<U>>)

It just seemed people weren't OK with that.  (Though to be fair, we 
didn't always have the second form, and its addition might be enough to 
avoid the need for the Collection and array forms.  It also allows 
reclaiming of the good name "flatMap", since there is actual mapping 
going on, and the generator form can piggyback on that.)

So, +1 to Remi's implicit suggestion:

   Stream<U> flatMap(FlatMapper<T, U>)
   Stream<U> flatMap(Function<T, Stream<U>>)

That's the new proposal.

Will be carved in stone in 24h unless there is further discussion :)


From forax at univ-mlv.fr  Wed Feb  6 15:59:04 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 07 Feb 2013 00:59:04 +0100
Subject: explode
In-Reply-To: <5112E787.5090809@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
Message-ID: <5112EE48.8060702@univ-mlv.fr>

On 02/07/2013 12:30 AM, Brian Goetz wrote:
>> You said that we should not use Collection explicitly in the stream API
>> hence we don't have toList(), toSet(), or groupBy() but
>> collect(toList()), collect(toSet()) or  collect(groupingBy)
>> and at the same time, for flatMap which will be less used, you want to
>> add flatMapToCollection, flatMapToArray.
>
> Yes, any coupling to Collection is undesirable and has to be 
> justified.  We're currently in a nice place (zero uses of Collection 
> in Stream) so it would be nice to stay there, and one is a lot worse 
> than zero.
>
> But be careful that you try to turn consistency into a goal unto 
> itself.  For example, the use of Collections in Collectors is an ideal 
> compromise; the important thing is they are out of the core interface 
> which we expect every aggregate for the next 10+ years to implement, 
> but are still available for easy use through standalone static helper 
> methods like groupingBy.  This is an ideal balance of giving users 
> tools to do their job without tying Stream to Collection.
>
>> I think you should be at least consistent, so either we have an Exploder
>> like we have a Collector,
>> or we have several overloads for flatMap, groupBy and toList/toSet.
>
> Personally, I would (fairly strongly) prefer to have only:
>
>   Stream<U> flatMap(FlatMapper<T, U>)
>
> and
>
>   Stream<U> flatMap(Function<T, Stream<U>>)
>
> One can quite easily derive the Collection (and with slightly more 
> work, array) cases from the first form (or the second form, with more 
> runtime overhead):
>
>   .flatMap((t, sink) -> getColl(t).forEach(sink))
>   .flatMap(t -> getColl(t).stream())
>
> In fact, the first is what we originally had.  But then people howled 
> that (a) "I can't understand flatMap" and (b) "I think flatMap should 
> take a Function<T, Collection<U>>".  In our early focus groups, people 
> saw the base form of FlatMap and universally cried "WTF?"  People 
> can't understand it.  After 100 people make the same comment, you 
> start to get that its a pain point.
>
> So, the proposal I made today attempts to take into account that 
> people are not yet ready to understand this form of flatMap, and 
> attempts to compromise.  But I'll happily retreat from that, and vote 
> for just
>
>   Stream<U> flatMap(FlatMapper<T, U>)
>   Stream<U> flatMap(Function<T, Stream<U>>)
>
> It just seemed people weren't OK with that.  (Though to be fair, we 
> didn't always have the second form, and its addition might be enough 
> to avoid the need for the Collection and array forms.  It also allows 
> reclaiming of the good name "flatMap", since there is actual mapping 
> going on, and the generator form can piggyback on that.)
>
> So, +1 to Remi's implicit suggestion:
>
>   Stream<U> flatMap(FlatMapper<T, U>)
>   Stream<U> flatMap(Function<T, Stream<U>>)
>
> That's the new proposal.
>
> Will be carved in stone in 24h unless there is further discussion :)
>

I will vote on this if FlatMapper also defines static methods to see a 
function to a collection or to an array as a FlatMapper.

interface FlatMapper<T, U> {
   public void explodeInto(T t, Consumer<? super U> consumer);

   public static <T, U> FlatMapper<T, U> explodeCollection(Function<? 
super T, ? extends Collection<? extends U>> function) {
     return (element, consumer) -> 
function.apply(element).forEach(consumer);
   }
   ...
}

so one can write:
   stream.flatMap(explodeCollection(Person::getFriends))

R?mi


From kevinb at google.com  Wed Feb  6 16:05:01 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 6 Feb 2013 16:05:01 -0800
Subject: explode
In-Reply-To: <5112E787.5090809@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
Message-ID: <CAGKkBkvTjtcrNvNXMSbYaCMDe6xYL8uCWRxEkjZEUXjTziqk_w@mail.gmail.com>

On Wed, Feb 6, 2013 at 3:30 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

   Stream<U> flatMap(FlatMapper<T, U>)
>
  Stream<U> flatMap(Function<T, Stream<U>>)
>

To make sure I understand: would these two behave identically? Would they
imaginably perform comparably?

    foos.stream().flatMap((t, consumer) ->
t.somethingThatGivesAStream().forEach(consumer))
    foos.stream().flatMap(t -> t.somethingThatGivesAStream())

Second question, why "FlatMapper.OfInt" here, but "IntSupplier" etc.
elsewhere?

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From forax at univ-mlv.fr  Wed Feb  6 16:06:37 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 07 Feb 2013 01:06:37 +0100
Subject: One import static to rule them all
Message-ID: <5112F00D.4010506@univ-mlv.fr>

I wonder if we should not create one artificial interface that extends 
Collector, FlatMapper, etc,
i.e. every interfaces that declare static methods that can be used by 
the Stream API
just because it will be easier to do an import static on this interface.

interface StaticDefaults   // better name needed
   extends Collector<Void, Void>, FlatMapper<Void,Void> {
}

otherwise, every Java projects will define its own one.

cheers,
R?mi


From brian.goetz at oracle.com  Wed Feb  6 16:11:42 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 06 Feb 2013 19:11:42 -0500
Subject: explode
In-Reply-To: <5112EE48.8060702@univ-mlv.fr>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com> <5112EE48.8060702@univ-mlv.fr>
Message-ID: <5112F13E.2060807@oracle.com>

> I will vote on this if FlatMapper also defines static methods to see a
> function to a collection or to an array as a FlatMapper.

Reasonable request.  Where would they live?


From brian.goetz at oracle.com  Wed Feb  6 16:16:15 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 06 Feb 2013 19:16:15 -0500
Subject: explode
In-Reply-To: <CAGKkBkvTjtcrNvNXMSbYaCMDe6xYL8uCWRxEkjZEUXjTziqk_w@mail.gmail.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CAGKkBkvTjtcrNvNXMSbYaCMDe6xYL8uCWRxEkjZEUXjTziqk_w@mail.gmail.com>
Message-ID: <5112F24F.8080308@oracle.com>

>        Stream<U> flatMap(FlatMapper<T, U>)
>        Stream<U> flatMap(Function<T, Stream<U>>)
>
> To make sure I understand: would these two behave identically? Would
> they imaginably perform comparably?
>
>      foos.stream().flatMap((t, consumer) ->
> t.somethingThatGivesAStream().forEach(consumer))
>      foos.stream().flatMap(t -> t.somethingThatGivesAStream())

Currently, they would behave identically.  The T -> Stream<T> form is 
not strictly necessary, since it can be written in terms of the other, 
but people will find it more convenient.

One place where they might not behave identically in the future is that 
since streams are lazy, we might be able to make:

   integers.flatMap(i -> anInfiniteStream()).getFirst()

actually terminate, whereas

   integers.flatMap((i,consumer) -> 
anInfiniteStream().forEach(consumer)).getFirst()

will never terminate.  So the laziness-preserving aspect of Stream is nice.

The second would perform basically the same as the first.  But neither 
would perform as well as actually generating the results directly into 
the consumer.

> Second question, why "FlatMapper.OfInt" here, but "IntSupplier" etc.
> elsewhere?

Depends where FlatMapper lives.  If FlatMapper is a general SAM, then it 
would go in j.u.f. and we'd definitely use the IntFlatMapper convention. 
  However, I would lean towards making FlatMapper a type in j.u.s., in 
which case the naming convention more prevalent there is to use nested 
OfXxx classes.

From kevinb at google.com  Wed Feb  6 16:28:41 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 6 Feb 2013 16:28:41 -0800
Subject: One import static to rule them all
In-Reply-To: <5112F00D.4010506@univ-mlv.fr>
References: <5112F00D.4010506@univ-mlv.fr>
Message-ID: <CAGKkBktVfsV-21zc8q8N4gdsHjpJKYYxGYcgu_L0OJhwnHjZtg@mail.gmail.com>

I have been promised that this won't work -- that to invoke a static method
on an interface one *must* refer to the exact interface it was defined on,
not a subtype, not an instance.  Can someone please confirm this is true?


On Wed, Feb 6, 2013 at 4:06 PM, Remi Forax <forax at univ-mlv.fr> wrote:

> I wonder if we should not create one artificial interface that extends
> Collector, FlatMapper, etc,
> i.e. every interfaces that declare static methods that can be used by the
> Stream API
> just because it will be easier to do an import static on this interface.
>
> interface StaticDefaults   // better name needed
>   extends Collector<Void, Void>, FlatMapper<Void,Void> {
> }
>
> otherwise, every Java projects will define its own one.
>
> cheers,
> R?mi
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From forax at univ-mlv.fr  Wed Feb  6 16:36:46 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 07 Feb 2013 01:36:46 +0100
Subject: explode
In-Reply-To: <5112F13E.2060807@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com> <5112EE48.8060702@univ-mlv.fr>
	<5112F13E.2060807@oracle.com>
Message-ID: <5112F71E.4090509@univ-mlv.fr>

On 02/07/2013 01:11 AM, Brian Goetz wrote:
>> I will vote on this if FlatMapper also defines static methods to see a
>> function to a collection or to an array as a FlatMapper.
>
> Reasonable request.  Where would they live?
>

I hope that most of the static methods should live in their 
corresponding interface,
it's easier for devs to find them, it's easier for IDE to auto-complete 
them.

R?mi


From forax at univ-mlv.fr  Wed Feb  6 16:35:29 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 07 Feb 2013 01:35:29 +0100
Subject: One import static to rule them all
In-Reply-To: <CAGKkBktVfsV-21zc8q8N4gdsHjpJKYYxGYcgu_L0OJhwnHjZtg@mail.gmail.com>
References: <5112F00D.4010506@univ-mlv.fr>
	<CAGKkBktVfsV-21zc8q8N4gdsHjpJKYYxGYcgu_L0OJhwnHjZtg@mail.gmail.com>
Message-ID: <5112F6D1.70803@univ-mlv.fr>

On 02/07/2013 01:28 AM, Kevin Bourrillion wrote:
> I have been promised that this won't work -- that to invoke a static 
> method on an interface one /must/ refer to the exact interface it was 
> defined on, not a subtype, not an instance.  Can someone please 
> confirm this is true?

We talk about a call like Interface.staticM(), but we never talk about 
the static import explicitly.
So if someone does a static import on an interface, you suggest that the 
compiler should see only the static methods declared in the interface 
and all the static fields declared or inherited from inherited 
interfaces (for backward compat.) ?

R?mi

>
>
>
> On Wed, Feb 6, 2013 at 4:06 PM, Remi Forax <forax at univ-mlv.fr 
> <mailto:forax at univ-mlv.fr>> wrote:
>
>     I wonder if we should not create one artificial interface that
>     extends Collector, FlatMapper, etc,
>     i.e. every interfaces that declare static methods that can be used
>     by the Stream API
>     just because it will be easier to do an import static on this
>     interface.
>
>     interface StaticDefaults   // better name needed
>       extends Collector<Void, Void>, FlatMapper<Void,Void> {
>     }
>
>     otherwise, every Java projects will define its own one.
>
>     cheers,
>     R?mi
>
>
>
>
> -- 
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com 
> <mailto:kevinb at google.com>


From kevinb at google.com  Wed Feb  6 18:09:32 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Wed, 6 Feb 2013 18:09:32 -0800
Subject: One import static to rule them all
In-Reply-To: <5112F6D1.70803@univ-mlv.fr>
References: <5112F00D.4010506@univ-mlv.fr>
	<CAGKkBktVfsV-21zc8q8N4gdsHjpJKYYxGYcgu_L0OJhwnHjZtg@mail.gmail.com>
	<5112F6D1.70803@univ-mlv.fr>
Message-ID: <CAGKkBktd6rC5u_EuGKHFMrXQzEPwsnkY9N=wdE3y46gmfPv44g@mail.gmail.com>

On Wed, Feb 6, 2013 at 4:35 PM, Remi Forax <forax at univ-mlv.fr> wrote:

We talk about a call like Interface.staticM(), but we never talk about the
> static import explicitly.
> So if someone does a static import on an interface, you suggest that the
> compiler should see only the static methods declared in the interface and
> all the static fields declared or inherited from inherited interfaces (for
> backward compat.) ?
>

I would certainly expect that what I can static-import, I could also invoke
directly, yes.


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From maurizio.cimadamore at oracle.com  Thu Feb  7 02:29:45 2013
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Thu, 07 Feb 2013 10:29:45 +0000
Subject: explode
In-Reply-To: <5112D9FC.9010707@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com>
Message-ID: <51138219.3030001@oracle.com>

On 06/02/13 22:32, Brian Goetz wrote:
> Guys, we need to close on the open Stream API items relatively soon. 
> Maybe we're almost there on flatMap.
>
> Of the alternatives for flatMap below, I think while Alternative B is 
> attractive from a client code perspective, I think Alternative A is 
> less risky with respect to stressing the compiler (and also introduces 
> fewer new types.)
Did you find cases where B doesn't work with existing strategy? I think 
it won't stress the compiler more than what map does (actually it will 
do less so) - so, if we have are fine with supporting map, I don't see 
big problems with this, complexity-wise.

Maurizio
>
> So, semi-concrete proposal:
>
>   Stream<U> flatMapToCollection(Function<T, Collection<U>>)
>   Stream<U> flatMapToArray(Function<T, U[]>) // do we even need this?
>   Stream<U> flatMap(Function<T, Stream<U>>)
>   Stream<U> flatMap(FlatMapper<T, U>)
>
> where
>
>   interface FlatMapper<T, U> {
>      void explodeInto(T t, Consumer<U> consumer);
>   }
>
> with specializations for primitives:
>
>   IntStream flatMap(FlatMapper.OfInt<T>)
>   ... etc
>
> We can then position flatMap as the "advanced" version, so from a 
> "graduated learning" perspective, people will find fMTC first, if that 
> meets their needs, great, and the Javadoc for fMTC can guide them to 
> fM for the more advanced cases.
>
>
>
> On 2/4/2013 3:37 PM, Brian Goetz wrote:
>>>  From this, here's what I think is left to do:
>>>   - More work on explode needed
>>  >   ...
>>
>> Circling back to this.  Clearly explode() is not done.  Let me try and
>> capture all the relevant info in one place.
>>
>> Let's start with some background.  Why do we want this method at all?
>> Well, it's really useful!  It's fairly common to do things like:
>>
>>    Stream<Order> orders = ...
>>    Stream<LineItem> lineItems   // explicit declaration for clarity
>>       = orders.explode(... order.getLineItems() ...)
>>
>> and it is often desirable to do then streamy things on the stream of
>> line items.  Those who have used flatMap in Scala get used to having it
>> quite quickly, and would be very sad if it were taken away.  Ask Don how
>> many examples in his katas use it.  (Doug will also point out that if
>> you have flatMap, you don't really need map or filter -- see examples in
>> CHM -- since both can be layered atop flatMap, modulo performance
>> concerns.)
>>
>> (It does have the potential to put some stress on the system when an
>> element can be mapped to very large collections, because it sucks you
>> into the problem of nested parallelism.  (This is the inverse of another
>> problem we already have, which is when filter stages have very high
>> selectivity, and we end up with a lot of splitting overhead for the
>> number of elements that end up at the tail of the pipeline.) But when
>> mapping an element to a small number of other elements, as is common in
>> a lot of use cases, there is generally no problem here.)
>>
>> Scala has a method flatMap on whatever they'd call Stream<T>, which
>> takes a function
>>
>>    T -> Stream<U>
>>
>> and produces a Stream<U>.  More generally, this shape of flatMap applies
>> (and is supported by higher-kinded generics) to many traits in the Scala
>> library, where you have a Foo[A] and the flatMap method takes an A ->
>> Foo[B] and produces a Foo[B].  (Our generics can't capture this.)
>>
>> This is the shape for flatMap that everyone really wants.  But here, we
>> run into unfortunate reality: this works great in functional languages
>> with cheap structural types, but that's not Java.
>>
>> I took it as a key design goal for flatMap/mapMulti/explode that: if an
>> element t maps to nothing, the implementation should do as close to zero
>> work as possible.
>>
>> This rules out shaping flatMap as:
>>
>>    <U> Stream<U> flatMap(Function<T, Collection<U>>)
>>
>> because, if you don't already have the collection lying around, the
>> lambdas for this are nasty to write (try writing one, and you'll see
>> what I mean), inefficient to execute, and create work for the library to
>> iterate the result.  In the limit, where t maps to the empty collection,
>> creating an iterating an empty collection for each element is nuts.  (By
>> contrast, in languages like Haskell, wrapping elements with lists is
>> very cheap.)
>>
>> However, the above shape is desirable as a convenience in the case you
>> already do have a collection lying around.  So let's put it in the bin
>> of "nice conveniences to also deliver when we solve the main problem."
>>
>> It also rules out shaping flatMap as:
>>
>>    <U> Stream<U> flatMap(Function<T, Stream<U>>)
>>
>> because that's even worse -- creating ad-hoc streams is even more
>> expensive than creating collections.
>>
>> To simplify, imagine there are two use cases we have to satisfy:
>>
>>    - map element to generator (general case)
>>    - map element to collection (convenience case)
>>
>> The other cases (to array, to stream) are similar enough to the
>> collection case.
>>
>> To illustrate the general "generator" case, here's an example of a
>> lambda (using the current API) that takes a Stream of String and
>> produces a Stream of Integer values which are the characters of that
>> stream:
>>
>> (sink, element) -> {
>>      for (int i=0; i<element.length(); i++)
>>          sink.send(element.charAt(i));
>> }
>>
>> It's efficient (no input, no output) and pretty easy to see what's going
>> on.  Would be nicer if we could spell "sink.send" as "yield", but oh
>> well.  Here's how we'd have to write that if we didn't support the
>> generator case:
>>
>> (element) -> {
>>      ArrayList<Integer> list = new ArrayList<>();
>>      for (int i=0; i<element.length(); i++)
>>          list.add(element.charAt(i));
>>      return list;
>> }
>>
>> Bigger and less efficient.  And it gets uglier if we want to try and
>> optimize away the list creation in the empty case:
>>
>> (element) -> {
>>      if (element.length() == 0)
>>          return Collections.emptyList();
>>      ArrayList<Integer> list = new ArrayList<>();
>>      for (int i=0; i<element.length(); i++)
>>          list.add(element.charAt(i));
>>      return list;
>> }
>>
>> We're really starting to lose sight of what this lambda does. (Hopefully
>> this will put to bed the notion that all we need is the T->Collection<U>
>> case.)
>>
>> Erasure plays a role here too.  Ideally, it would be nice to overload
>> methods for
>>
>>    flatMap(Function<T, Collection<U>>)
>>    flatMap(Function<T, U[])
>>
>> but obviously we can't do that (directly).
>>
>>
>> The original API had only:
>>
>>      <U> Stream<U> flatMap(MultiFunction<T,U> mf)
>>
>> where MultiFunction was (T, Consumer<U>) -> void.  If users already had
>> a Collection lying around, they had to iterate it themselves:
>>
>> (element, sink) -> {
>>                       for (U u : findCollection(t))
>>                           sink.accept(u);
>>                     }
>>
>> which isn't terrible but people didn't like it -- I think not because it
>> was hard to read, but hard to figure out how to use flatMap at all.
>>
>> The current iteration provides a helper class with helper methods for
>> handling collections, arrays, and streams, but you still have to wrap
>> your head around why you're being passed two things before doing
>> anything -- and  I think its the "before doing anything" part that
>> really messes people up.
>>
>>
>> So, here's two alternatives that I hope may be better (and not run into
>> problems with type inference).
>>
>> Alternative A: overloading on method names.
>>
>>      // Map T -> Collection<U>
>>      public<U> StreamA<U> explodeToCollection(Function<T, Collection<U>>
>> mapper);
>>
>>      // Map T -> U[]
>>      public<U> StreamA<U> explodeToArray(Function<T, U[]> mapper);
>>
>>      // Generator case -- pass a T and a Consumer
>>      public<U> StreamA<U> explodeToConsumer(BiConsumer<T, Consumer<U>>
>> mapper);
>>
>>      // Alternate version of generator case -- with named SAM instead
>>      public<U> StreamA<U> altExplodeToConsumer(Exploder<T, U> mapper);
>>
>>      interface Exploder<T, U> {
>>          void explode(T element, Consumer<U> consumer);
>>      }
>>
>> Here, we have various explodeToXxx methods (naming is purely
>> illustrative) that defeat the erasure problem.  Users seeking the
>> T->Collection version can use the appropriate versions with no problem.
>>   When said users discover that their performance sucks, they have
>> motivation to learn to use the more efficient generator version.
>>
>> Usage examples:
>>
>>      StreamA<Integer> a1
>>          = a.explodeToArray(i -> new Integer[] { i });
>>      StreamA<Integer> a2
>>          = a.explodeToCollection(i -> Collections.singleton(i));
>>      StreamA<Integer> a3
>>          = a.explodeToConsumer((i, sink) -> sink.accept(i));
>>
>>
>> Alternative B: overload on SAMs.  This involves three SAMs:
>>
>>      interface Exploder<T, U> {
>>          void explode(T element, Consumer<U> consumer);
>>      }
>>
>>      interface MapperToCollection<T, U>
>>          extends Function<T, Collection<U>> { }
>>
>>      interface MapperToArray<T, U> extends Function<T, U[]> { }
>>
>> And three overloaded explode() methods:
>>
>>      public<U> StreamB<U> explode(MapperToCollection<T, U> exploder);
>>
>>      public<U> StreamB<U> explode(MapperToArray<T, U> exploder);
>>
>>      public<U> StreamB<U> explode(Exploder<T, U> exploder);
>>
>> Usage examples:
>>
>>      StreamB<Integer> b1 = b.explode(i -> new Integer[] { i });
>>      StreamB<Integer> b2 = b.explode(i -> Collections.singleton(i));
>>      StreamB<Integer> b3 = b.explode((i, sink) -> sink.accept(i));
>>
>>
>> I think the second approach is pretty decent.  Users can easily
>> understand the first two versions and use them while wrapping their head
>> around the third.
>>
>>


From tim at peierls.net  Thu Feb  7 10:12:39 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 7 Feb 2013 13:12:39 -0500
Subject: explode
In-Reply-To: <5112E787.5090809@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
Message-ID: <CA+F8eeTedtPL=ERBSivgphspG1xcm534k5MQ8SBTpQZG6aoMaw@mail.gmail.com>

On Wed, Feb 6, 2013 at 6:30 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

>   Stream<U> flatMap(FlatMapper<T, U>)
>   Stream<U> flatMap(Function<T, Stream<U>>)
>
> That's the new proposal.
>
> Will be carved in stone in 24h unless there is further discussion :)
>

Still have six hours. :-)

I hate to give up verbose/friendly flatMapToCollection entirely. It's not
immediately obvious to me how to write it myself, and it feels as though
it'll come up a lot. Even just an example in javadocs would help.

--tim

From brian.goetz at oracle.com  Thu Feb  7 10:16:15 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 07 Feb 2013 13:16:15 -0500
Subject: explode
In-Reply-To: <CA+F8eeTedtPL=ERBSivgphspG1xcm534k5MQ8SBTpQZG6aoMaw@mail.gmail.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CA+F8eeTedtPL=ERBSivgphspG1xcm534k5MQ8SBTpQZG6aoMaw@mail.gmail.com>
Message-ID: <5113EF6F.5060500@oracle.com>

I think the proposed solution there is:
  - example in Javadoc, and/or static helper

static <U> FlatMapper<T,U>
flatMapperToCollection(Mapper<T,Collection<U>> m) {
     (t, sink) -> m.apply(t).forEach(sink);
}

So users can say

   stream.flatMap(flatMapperToCollection(t -> getColl(t)))

and the javadoc can point them to that.

On 2/7/2013 1:12 PM, Tim Peierls wrote:
> On Wed, Feb 6, 2013 at 6:30 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>        Stream<U> flatMap(FlatMapper<T, U>)
>        Stream<U> flatMap(Function<T, Stream<U>>)
>
>     That's the new proposal.
>
>     Will be carved in stone in 24h unless there is further discussion :)
>
>
> Still have six hours. :-)
>
> I hate to give up verbose/friendly flatMapToCollection entirely. It's
> not immediately obvious to me how to write it myself, and it feels as
> though it'll come up a lot. Even just an example in javadocs would help.
>
> --tim
>

From daniel.smith at oracle.com  Thu Feb  7 10:22:11 2013
From: daniel.smith at oracle.com (Dan Smith)
Date: Thu, 7 Feb 2013 11:22:11 -0700
Subject: One import static to rule them all
In-Reply-To: <5112F6D1.70803@univ-mlv.fr>
References: <5112F00D.4010506@univ-mlv.fr>
	<CAGKkBktVfsV-21zc8q8N4gdsHjpJKYYxGYcgu_L0OJhwnHjZtg@mail.gmail.com>
	<5112F6D1.70803@univ-mlv.fr>
Message-ID: <5505E6A5-AB90-41D0-A9E0-C8B7BEDDEC9A@oracle.com>

On Feb 6, 2013, at 5:35 PM, Remi Forax <forax at univ-mlv.fr> wrote:

> On 02/07/2013 01:28 AM, Kevin Bourrillion wrote:
>> I have been promised that this won't work -- that to invoke a static method on an interface one /must/ refer to the exact interface it was defined on, not a subtype, not an instance.  Can someone please confirm this is true?
> 
> We talk about a call like Interface.staticM(), but we never talk about the static import explicitly.
> So if someone does a static import on an interface, you suggest that the compiler should see only the static methods declared in the interface and all the static fields declared or inherited from inherited interfaces (for backward compat.) ?

The invocation restriction is imposed by defining inheritance such that the subinterface does not inherit its superinterface's members.  So the parent's static methods are not members of the child.  Static import, in turn, is defined in terms of the members of the child.

(In fact, the inability to invoke via a child isn't really the goal -- it's more like a side effect.  The main goal, I think, is to avoid lots of pain points that arise we we start to deal with multiple inheritance of static methods.)

?Dan

From tim at peierls.net  Thu Feb  7 10:34:08 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 7 Feb 2013 13:34:08 -0500
Subject: explode
In-Reply-To: <5113EF6F.5060500@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CA+F8eeTedtPL=ERBSivgphspG1xcm534k5MQ8SBTpQZG6aoMaw@mail.gmail.com>
	<5113EF6F.5060500@oracle.com>
Message-ID: <CA+F8eeRr6219_CVqq7f+1Wpb3QRfszo8=Bz1cFsznpsGJNvF=g@mail.gmail.com>

On Thu, Feb 7, 2013 at 1:16 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> I think the proposed solution there is:
>  - example in Javadoc, and/or static helper...
>

OK, then.

I was still having trouble following the semantics while names were being
discussed, so I'm very late to the naming party. "flatMap" doesn't convey
much to me, but I guess I could learn to use it. Here's my beef: A roughly
analogous name in Guava is the unlovely but crystal clear
"transformAndConcat". That's not accurate enough here, since the action is
more general than concatenation, but something like "mapAndCollect" conveys
the process in the right order -- first map, then collect the results --
which is something that "flatMap" gets backwards: "a flattening of mapped
elements".

--tim

From tim at peierls.net  Thu Feb  7 10:41:10 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 7 Feb 2013 13:41:10 -0500
Subject: Collectors update redux
In-Reply-To: <5112D53F.2080205@oracle.com>
References: <5112D53F.2080205@oracle.com>
Message-ID: <CA+F8eeRJq=5w2JqkKQJ0EBXQ33BOCAJdEdT=Vne=afP7AinUnw@mail.gmail.com>

Is three-arg collect really the target "on ramp"? I would have thought the
first stop would be the combinators. OTOH ... there's a lot of stuff in
there. I can think of uses for all of it, but I worry about someone faced
with picking the right static factory method of Collectors. Maybe with the
right class comment, users can be guided to the right combinator without
having to know much.

--tim


On Wed, Feb 6, 2013 at 5:12 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Did more tweaking with Collectors.
>
> Recall there are two basic forms of the collect method:
>
> The most basic one is the "on ramp", which doesn't require any
> understanding of Collector or the combinators therein; it is basically the
> mutable version of reduce.  It looks like:
>
>   collect(() -> R, (R,T) -> void, (R,R) -> void)
>
> The API shape is defined so that most invocations will work with method
> references:
>
>   // To ArrayList
>   collect(ArrayList::new, ArrayList::add, ArrayList::addAll)
>
> Note that this works in parallel too; we create list at the leaves with
> ::add, and merge them up the tree with ::addAll.
>
>   // String concat
>   collect(StringBuilder::new, StringBuilder::append,
>           StringBuilder::append)
>
>   // Turn an int stream to a BitSet with those bits set
>   collect(BitSet::new, BitSet::set, BitSet::or)
>
>   // String join with delimiter
>   collect(() -> new StringJoiner(", "), StringJoiner::append,
>           StringJoiner::append)
>
> Again, all these work in parallel.
>
> Digression: the various forms of reduce/etc form a ladder in terms of
> complexity:
>
>   If you understand reduction, you can understand...
>   ...reduce(T, BinaryOperator<T>)
>
>   If you understand the above + Optional, you can then understand...
>   ...reduce(BinaryOperator<T>)
>
>   If you understand the above + "fold" (nonhomogeneous reduction), you can
> then understand...
>     ...reduce(U, BiFunction<U, T, U> accumulator, BinaryOperator<U>);
>
>   If you understand the above + "mutable fold" (inject), you can then
> understand...
>     ...collect(Supplier<R>, (R,T) -> void, (R,R) -> void)
>
>   If you understand the above + "Collector", you can then understand...
>     ...collect(Collector)
>
> This is all supported by the principle of commensurate effort; learn a
> little more, can do a little more.
>
> OK, exiting digression, moving to the end of the list, those that use
> "canned" Collectors.
>
>   collect(Collector)
>   collectUnordered(Collector)
>
> Collectors are basically a tuple of three lambdas and a boolean indicating
> whether the Collector can handle concurrent insertion:
>
>   Collector<T,R> = { () -> R, (R,T) -> void, (R,R) -> R, isConcurrent }
>
> Note there is a slight difference in the last argument, a
> BinaryOperator<R> rather than a BiConsumer<R,R>.  The BinaryOperator form
> is more flexible (it can support appending two Lists into a tree
> representation without copying the elements, whereas the (R,R) -> void form
> can't.)  This asymmetry is a rough edge, though in each case, the shape is
> "locally" optimal (in the three-arg version, the void form supports method
> refs better; in the Collector version, the result is more flexible, and
> that's where we need the flexibility.)  But we could make them consistent
> at the cost of the above uses becoming more like:
>
>   collect(StringBuilder::new, StringBuilder::append,
>           (l, r) -> { l.append(r); return l; })
>
> Overall I think the current API yields better client code at the cost of
> this slightly rough edge.
>
>
> The set of Collectors now includes:
>   toCollection(Supplier<**Collection>)
>   toList()
>   toSet()
>   toStringBuilder()
>   toStringJoiner(delimiter)
>
>   // mapping combinators (plus primitive specializations)
>   mapping(T->U, Collector<U>)
>
>   // Single-level groupBy
>   groupingBy(T->K)
>
>   // groupBy with downstream Collector)
>   groupingBy(T->K, Collector<T>)
>
>   // grouping plus reduce
>   groupingReduce(T->K, BinaryOperator<T>) // reduce only
>   groupingReduce(T->K, T->U, BinaryOperator<U>) // map+reduce
>
>   // join (nee mappedTo)
>   joiningWith(T -> U) // produces Map<T,U>
>
>   // partition
>   partitioningBy(Predicate<T>)
>   partitioningBy(Predicate<T>, Collector<T>)
>   partitioningReduce(Predicate<**T>, BinaryOperator<T>)
>   partitioningReduce(Predicate<**T>, T->U, BinaryOperator<T>)
>
>   // statistics (gathers sum, count, min, max, average)
>   toLongStatistics()
>   toDoubleStatistics()
>
> Plus, concurrent versions of most of these (which are suitable for
> unordered/contended/forEach-**style execution.)  Plus versions that let
> you offer explicit constructors for maps and collections.  While these may
> seem like a lot, the implementations are highly compact -- all of these
> together, plus supporting machinery, fit in 500 LoC.
>
> These Collectors are designed around composibility.  (It is vaguely
> frustrating that we even have to separate the "with downstream Collector"
> versions from the reducing versions.)  So they each have a form where you
> can do some level of categorization and then use a downstream collector to
> do further computation.  This is very powerful.
>
> Examples, again using the familiar problem domain of transactions:
>
> class Txn {
>     Buyer buyer();
>     Seller seller();
>     String description();
>     int amount();
> }
>
> Transactions by buyer:
>
>   Map<Buyer, Collection<Txn>>
>     m = txns.collect(groupingBy(Txn::**buyer));
>
> Highest-dollar transaction by buyer:
>   Map<Buyer, Transaction>
>     m = txns.collect(
>         groupingReduce(Txn::buyer,
>                        Comparators.greaterOf(
>                            Comparators.comparing(Txn::**amount)));
>
> Here, comparing() takes the Txn -> amount function, and produces a
> Comparator<Txn>; greaterOf(comparator) turns that Comparator into a
> BinaryOperator that corresponds to "max by comparator".  We then reduce on
> that, yielding highest-dollar transaction per buyer.
>
> Alternately, if you want the number, not the transaction:
>   Map<Buyer, Integer>
>     m = txns.collect(groupingReduce(**Txn::buyer,
>                                     Txn::amount, Integer::max));
>
> Transactions by buyer, seller:
>   Map<Buyer, Map<Seller, Collection<Txn>>>
>     m = txns.collect(groupingBy(Txn::**buyer, groupingBy(Txn::seller)));
>
> Transaction volume statistics by buyer, seller:
>
>   Map<Buyer, Map<Seller, LongStatistics>>
>     m = txns.collect(groupingBy(Txn::**buyer,
>                          groupingBy(Txn::seller,
>                                     mapping(Txn::amount,
>                                             toLongStatistics())));
>
> The statistics let you get at min, max, sum, count, and average from a
> single pass on the data (this trick taken from ParallelArray.)
>
> We can mix and match at various levels.  For example:
>
> Transactions by buyer, partitioned int "large/small" groups:
>
>   Predicate<Txn> isLarge = t -> t.amount() > BIG;
>   Map<Buyer, Map<Boolean, Collection<Transaction>>>
>     m = txns.collect(groupingBy(Txn::**buyer, partitioningBy(isLarge)));
>
> Or, turning it around:
>
>   Map<Boolean, Map<Buyer, Collection<Transaction>>>
>     m = txns.collect(partitioningBy(**isLarge, groupingBy(Txn::buyer)));
>
> Because Collector is public, Kevin can write and publish
> Guava-multimap-bearing versions of these -- probably in about ten minutes.
>
>

From brian.goetz at oracle.com  Thu Feb  7 10:54:36 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 07 Feb 2013 13:54:36 -0500
Subject: explode
In-Reply-To: <CA+F8eeRr6219_CVqq7f+1Wpb3QRfszo8=Bz1cFsznpsGJNvF=g@mail.gmail.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CA+F8eeTedtPL=ERBSivgphspG1xcm534k5MQ8SBTpQZG6aoMaw@mail.gmail.com>
	<5113EF6F.5060500@oracle.com>
	<CA+F8eeRr6219_CVqq7f+1Wpb3QRfszo8=Bz1cFsznpsGJNvF=g@mail.gmail.com>
Message-ID: <5113F86C.7070900@oracle.com>

flatMap is indeed map+flatten, but unfortunately we cannot factor it 
into two steps because of erasure.  (We can't make a method on 
Stream<Collection<T>> called flatten() that produces a Stream<T>.)

The name flatMap is used in Scala, and while I'm not suggesting that 
this constitutes any sort of proof of suitability, at least has some 
track record.

DIGRESSION

More generally, flatMap is commonly used to name the bind operator of a 
monad, where you have a type Foo<T>, and flatMap has the signature:

<U> Foo<U> flatMap(T -> Foo<U>)

IF we are to use the name flatMap, I feel it is important to at least 
have one overload that follows this naming pattern.


On 2/7/2013 1:34 PM, Tim Peierls wrote:
> On Thu, Feb 7, 2013 at 1:16 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     I think the proposed solution there is:
>       - example in Javadoc, and/or static helper...
>
>
> OK, then.
>
> I was still having trouble following the semantics while names were
> being discussed, so I'm very late to the naming party. "flatMap" doesn't
> convey much to me, but I guess I could learn to use it. Here's my beef:
> A roughly analogous name in Guava is the unlovely but crystal clear
> "transformAndConcat". That's not accurate enough here, since the action
> is more general than concatenation, but something like "mapAndCollect"
> conveys the process in the right order -- first map, then collect the
> results -- which is something that "flatMap" gets backwards: "a
> flattening of mapped elements".
>
> --tim

From kevinb at google.com  Thu Feb  7 10:56:00 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 7 Feb 2013 10:56:00 -0800
Subject: Collectors update redux
In-Reply-To: <CA+F8eeRJq=5w2JqkKQJ0EBXQ33BOCAJdEdT=Vne=afP7AinUnw@mail.gmail.com>
References: <5112D53F.2080205@oracle.com>
	<CA+F8eeRJq=5w2JqkKQJ0EBXQ33BOCAJdEdT=Vne=afP7AinUnw@mail.gmail.com>
Message-ID: <CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>

On Thu, Feb 7, 2013 at 10:41 AM, Tim Peierls <tim at peierls.net> wrote:

Is three-arg collect really the target "on ramp"?


IF you've been successfully spoon-fed the excellent examples (bitset etc.)
then you can see it as reasonably simple. Otherwise you're pretty lost in
the woods.


> I would have thought the first stop would be the combinators. OTOH ...
> there's a lot of stuff in there.


I think there is *way* too much stuff in there, and I don't have enough
time to even review it all before it gets set in stone. I strongly believe
we would be smarter to keep the set of prepackaged Collectors much smaller
and let third-party libraries experiment with which Collectors to provide.*
* And, no, it's not that I *want* more code that Guava will have to build
and maintain. It just seems far safer and more appropriate.  JDK only needs
the big ones -- a few versions of groupingBy, a few others, done.

It's harder to leave out Stream methods, but these are just static things
anyone could provide.


> I can think of uses for all of it, but I worry about someone faced with
> picking the right static factory method of Collectors. Maybe with the right
> class comment, users can be guided to the right combinator without having
> to know much.
>
> --tim
>
>
>
> On Wed, Feb 6, 2013 at 5:12 PM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> Did more tweaking with Collectors.
>>
>> Recall there are two basic forms of the collect method:
>>
>> The most basic one is the "on ramp", which doesn't require any
>> understanding of Collector or the combinators therein; it is basically the
>> mutable version of reduce.  It looks like:
>>
>>   collect(() -> R, (R,T) -> void, (R,R) -> void)
>>
>> The API shape is defined so that most invocations will work with method
>> references:
>>
>>   // To ArrayList
>>   collect(ArrayList::new, ArrayList::add, ArrayList::addAll)
>>
>> Note that this works in parallel too; we create list at the leaves with
>> ::add, and merge them up the tree with ::addAll.
>>
>>   // String concat
>>   collect(StringBuilder::new, StringBuilder::append,
>>           StringBuilder::append)
>>
>>   // Turn an int stream to a BitSet with those bits set
>>   collect(BitSet::new, BitSet::set, BitSet::or)
>>
>>   // String join with delimiter
>>   collect(() -> new StringJoiner(", "), StringJoiner::append,
>>           StringJoiner::append)
>>
>> Again, all these work in parallel.
>>
>> Digression: the various forms of reduce/etc form a ladder in terms of
>> complexity:
>>
>>   If you understand reduction, you can understand...
>>   ...reduce(T, BinaryOperator<T>)
>>
>>   If you understand the above + Optional, you can then understand...
>>   ...reduce(BinaryOperator<T>)
>>
>>   If you understand the above + "fold" (nonhomogeneous reduction), you
>> can then understand...
>>     ...reduce(U, BiFunction<U, T, U> accumulator, BinaryOperator<U>);
>>
>>   If you understand the above + "mutable fold" (inject), you can then
>> understand...
>>     ...collect(Supplier<R>, (R,T) -> void, (R,R) -> void)
>>
>>   If you understand the above + "Collector", you can then understand...
>>     ...collect(Collector)
>>
>> This is all supported by the principle of commensurate effort; learn a
>> little more, can do a little more.
>>
>> OK, exiting digression, moving to the end of the list, those that use
>> "canned" Collectors.
>>
>>   collect(Collector)
>>   collectUnordered(Collector)
>>
>> Collectors are basically a tuple of three lambdas and a boolean
>> indicating whether the Collector can handle concurrent insertion:
>>
>>   Collector<T,R> = { () -> R, (R,T) -> void, (R,R) -> R, isConcurrent }
>>
>> Note there is a slight difference in the last argument, a
>> BinaryOperator<R> rather than a BiConsumer<R,R>.  The BinaryOperator form
>> is more flexible (it can support appending two Lists into a tree
>> representation without copying the elements, whereas the (R,R) -> void form
>> can't.)  This asymmetry is a rough edge, though in each case, the shape is
>> "locally" optimal (in the three-arg version, the void form supports method
>> refs better; in the Collector version, the result is more flexible, and
>> that's where we need the flexibility.)  But we could make them consistent
>> at the cost of the above uses becoming more like:
>>
>>   collect(StringBuilder::new, StringBuilder::append,
>>           (l, r) -> { l.append(r); return l; })
>>
>> Overall I think the current API yields better client code at the cost of
>> this slightly rough edge.
>>
>>
>> The set of Collectors now includes:
>>   toCollection(Supplier<**Collection>)
>>   toList()
>>   toSet()
>>   toStringBuilder()
>>   toStringJoiner(delimiter)
>>
>>   // mapping combinators (plus primitive specializations)
>>   mapping(T->U, Collector<U>)
>>
>>   // Single-level groupBy
>>   groupingBy(T->K)
>>
>>   // groupBy with downstream Collector)
>>   groupingBy(T->K, Collector<T>)
>>
>>   // grouping plus reduce
>>   groupingReduce(T->K, BinaryOperator<T>) // reduce only
>>   groupingReduce(T->K, T->U, BinaryOperator<U>) // map+reduce
>>
>>   // join (nee mappedTo)
>>   joiningWith(T -> U) // produces Map<T,U>
>>
>>   // partition
>>   partitioningBy(Predicate<T>)
>>   partitioningBy(Predicate<T>, Collector<T>)
>>   partitioningReduce(Predicate<**T>, BinaryOperator<T>)
>>   partitioningReduce(Predicate<**T>, T->U, BinaryOperator<T>)
>>
>>   // statistics (gathers sum, count, min, max, average)
>>   toLongStatistics()
>>   toDoubleStatistics()
>>
>> Plus, concurrent versions of most of these (which are suitable for
>> unordered/contended/forEach-**style execution.)  Plus versions that let
>> you offer explicit constructors for maps and collections.  While these may
>> seem like a lot, the implementations are highly compact -- all of these
>> together, plus supporting machinery, fit in 500 LoC.
>>
>> These Collectors are designed around composibility.  (It is vaguely
>> frustrating that we even have to separate the "with downstream Collector"
>> versions from the reducing versions.)  So they each have a form where you
>> can do some level of categorization and then use a downstream collector to
>> do further computation.  This is very powerful.
>>
>> Examples, again using the familiar problem domain of transactions:
>>
>> class Txn {
>>     Buyer buyer();
>>     Seller seller();
>>     String description();
>>     int amount();
>> }
>>
>> Transactions by buyer:
>>
>>   Map<Buyer, Collection<Txn>>
>>     m = txns.collect(groupingBy(Txn::**buyer));
>>
>> Highest-dollar transaction by buyer:
>>   Map<Buyer, Transaction>
>>     m = txns.collect(
>>         groupingReduce(Txn::buyer,
>>                        Comparators.greaterOf(
>>                            Comparators.comparing(Txn::**amount)));
>>
>> Here, comparing() takes the Txn -> amount function, and produces a
>> Comparator<Txn>; greaterOf(comparator) turns that Comparator into a
>> BinaryOperator that corresponds to "max by comparator".  We then reduce on
>> that, yielding highest-dollar transaction per buyer.
>>
>> Alternately, if you want the number, not the transaction:
>>   Map<Buyer, Integer>
>>     m = txns.collect(groupingReduce(**Txn::buyer,
>>                                     Txn::amount, Integer::max));
>>
>> Transactions by buyer, seller:
>>   Map<Buyer, Map<Seller, Collection<Txn>>>
>>     m = txns.collect(groupingBy(Txn::**buyer, groupingBy(Txn::seller)));
>>
>> Transaction volume statistics by buyer, seller:
>>
>>   Map<Buyer, Map<Seller, LongStatistics>>
>>     m = txns.collect(groupingBy(Txn::**buyer,
>>                          groupingBy(Txn::seller,
>>                                     mapping(Txn::amount,
>>                                             toLongStatistics())));
>>
>> The statistics let you get at min, max, sum, count, and average from a
>> single pass on the data (this trick taken from ParallelArray.)
>>
>> We can mix and match at various levels.  For example:
>>
>> Transactions by buyer, partitioned int "large/small" groups:
>>
>>   Predicate<Txn> isLarge = t -> t.amount() > BIG;
>>   Map<Buyer, Map<Boolean, Collection<Transaction>>>
>>     m = txns.collect(groupingBy(Txn::**buyer, partitioningBy(isLarge)));
>>
>> Or, turning it around:
>>
>>   Map<Boolean, Map<Buyer, Collection<Transaction>>>
>>     m = txns.collect(partitioningBy(**isLarge, groupingBy(Txn::buyer)));
>>
>> Because Collector is public, Kevin can write and publish
>> Guava-multimap-bearing versions of these -- probably in about ten minutes.
>>
>>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From tim at peierls.net  Thu Feb  7 10:59:22 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 7 Feb 2013 13:59:22 -0500
Subject: explode
In-Reply-To: <5113F86C.7070900@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CA+F8eeTedtPL=ERBSivgphspG1xcm534k5MQ8SBTpQZG6aoMaw@mail.gmail.com>
	<5113EF6F.5060500@oracle.com>
	<CA+F8eeRr6219_CVqq7f+1Wpb3QRfszo8=Bz1cFsznpsGJNvF=g@mail.gmail.com>
	<5113F86C.7070900@oracle.com>
Message-ID: <CA+F8eeSM1P4NGXiM-wA8BJoNY5=k7BNungd8++LwM=1bC0SLMw@mail.gmail.com>

On Thu, Feb 7, 2013 at 1:54 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> flatMap is indeed map+flatten, but unfortunately we cannot factor it into
> two steps because of erasure.  (We can't make a method on
> Stream<Collection<T>> called flatten() that produces a Stream<T>.)
>

I wasn't suggesting that the steps could be factored, only that a name that
suggests the intuitive order of the steps stands a better chance of being
understood and used by newbies.

--tim

From brian.goetz at oracle.com  Thu Feb  7 11:12:51 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 07 Feb 2013 14:12:51 -0500
Subject: Collectors update redux
In-Reply-To: <CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>
References: <5112D53F.2080205@oracle.com>
	<CA+F8eeRJq=5w2JqkKQJ0EBXQ33BOCAJdEdT=Vne=afP7AinUnw@mail.gmail.com>
	<CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>
Message-ID: <5113FCB3.80005@oracle.com>

>     Is three-arg collect really the target "on ramp"?

Sorry, I was probably not clear.  It is the onramp to the mutable part 
of the reduce functionality, but it builds on the more functional 
flavors, as outlined in the "digression" section.

> IF you've been successfully spoon-fed the excellent examples (bitset
> etc.) then you can see it as reasonably simple. Otherwise you're pretty
> lost in the woods.

I think that's fair.  Which points, as we've already agreed, to the fact 
that this is mostly a pedagogical problem.

>     I would have thought the first stop would be the combinators. OTOH
>     ... there's a lot of stuff in there.
>
> I think there is *way* too much stuff in there, and I don't have enough
> time to even review it all before it gets set in stone. I strongly
> believe we would be smarter to keep the set of prepackaged Collectors
> much smaller and let third-party libraries experiment with which
> Collectors to provide.

Conceptually, the set is pretty simple:

base collectors == toCollection, toStatistics, toStringBuilder, 
joinedWith (takes Stream<T> plus T->U, produces Map<T,U>)

combinator for map+collector
combinator for groupBy+collector
combinator for groupBy+reduce
combinator for partition+collector
combinator for partition+reduce

plus defaults for above where if you don't have a downstream collector, 
it assumes "toCollection" (e.g., the no-arg groupBy).

Individually, each of these is dead-simple both in concept and 
implementation (once you understand Collector) -- even the most complex 
are only 20 LoC, and many are are 1-2 LoC.  I think what creates the 
perception of complexity is the number of forms that jumps out at you on 
the Javadoc page?

The one place where we might consider reducing scope is by eliminating 
the forms that take an explicit Supplier<Map>.  In other words, you 
always get a HashMap / ConcurrentHashMap.  This cuts the number of 
groupBy/join forms in half.  But it leaves those who want, say, to group 
to a TreeMap out in the cold.

Do we feel that would be an improvement?

Alternately, we can refactor the Map-driven collectors so that instead 
of the Supplier<Map> being an argument, it can be a method on the Collector:

   collect(groupingBy(Txn::buyer).usingMap(TreeMap::new))

by having a ToMapCollector (extends Collector) with a usingMap() method. 
  This again gets us a nearly 2x reduction in number of methods in 
Collectors, at the cost of moving the "pick your own map" functionality 
to somewhere else.


From joe.bowbeer at gmail.com  Thu Feb  7 11:13:45 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 7 Feb 2013 11:13:45 -0800
Subject: explode
In-Reply-To: <CA+F8eeSM1P4NGXiM-wA8BJoNY5=k7BNungd8++LwM=1bC0SLMw@mail.gmail.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CA+F8eeTedtPL=ERBSivgphspG1xcm534k5MQ8SBTpQZG6aoMaw@mail.gmail.com>
	<5113EF6F.5060500@oracle.com>
	<CA+F8eeRr6219_CVqq7f+1Wpb3QRfszo8=Bz1cFsznpsGJNvF=g@mail.gmail.com>
	<5113F86C.7070900@oracle.com>
	<CA+F8eeSM1P4NGXiM-wA8BJoNY5=k7BNungd8++LwM=1bC0SLMw@mail.gmail.com>
Message-ID: <CAHzJPErKWF841nZ__t=izR_mjbq21xX93rvQaaXJVFGCdn5paw@mail.gmail.com>

I think flatmap is odd and I like Tim's suggestion. However if there is a
choice between two odd names then I prefer flatmap because I've encountered
it before in various functional contexts.
On Feb 7, 2013 10:59 AM, "Tim Peierls" <tim at peierls.net> wrote:

> On Thu, Feb 7, 2013 at 1:54 PM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> flatMap is indeed map+flatten, but unfortunately we cannot factor it into
>> two steps because of erasure.  (We can't make a method on
>> Stream<Collection<T>> called flatten() that produces a Stream<T>.)
>>
>
> I wasn't suggesting that the steps could be factored, only that a name
> that suggests the intuitive order of the steps stands a better chance of
> being understood and used by newbies.
>
> --tim
>

From brian.goetz at oracle.com  Thu Feb  7 11:34:10 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 07 Feb 2013 14:34:10 -0500
Subject: Collectors update redux
In-Reply-To: <CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>
References: <5112D53F.2080205@oracle.com>
	<CA+F8eeRJq=5w2JqkKQJ0EBXQ33BOCAJdEdT=Vne=afP7AinUnw@mail.gmail.com>
	<CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>
Message-ID: <511401B2.6040003@oracle.com>

> I think there is *way* too much stuff in there, and I don't have enough
> time to even review it all before it gets set in stone.

"Too much stuff here" is kind of vague.

Is the concern that some of the operations (e.g., partition) are just 
too niche to carry their weight?  Or not fully baked as concepts?

Or are some so obvious that we just expect people to write it themselves 
if they need it?

Is the concern that there are too many forms of each operation, and that 
the user will be bewildered by the variety?

Is it the complex interaction of {concurrent, ordered}?

Can you point to a few examples of methods you would eliminate?  Maybe 
we can induct to a pattern from there.


From brian.goetz at oracle.com  Thu Feb  7 11:53:30 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 07 Feb 2013 14:53:30 -0500
Subject: Fwd: hg: lambda/lambda/jdk: Replace explode() with two forms of
	flatMap: flatMap(T->Stream<U>), and flatMap(FlatMapper<T, U>)
In-Reply-To: <20130207193737.448A0478E8@hg.openjdk.java.net>
References: <20130207193737.448A0478E8@hg.openjdk.java.net>
Message-ID: <5114063A.3000203@oracle.com>

I pushed an update along the lines of what was discussed yesterday, so 
people can take a look.  Summary:

  - Eliminated "Downstream" abstraction
  - Added FlatMapper type (with nested specializations) in j.u.s.
  - Added five forms of Stream.flatMap
    flatMap(Function<T, Stream<U>)
    flatMap(FlatMapper<T,U>)
    flatMap(FlatMapper.To{Int,Long,Double}<T>)
  - Added one form of flatMap for each primitive stream:
    {Int,Long,Double}Stream.flatMap(FlatMapper.{ILD}To{ILD})

Check it out and see what you think.  Commit message attached.  I think 
this is an improvement.

Bikeshedding on naming can continue :)

-------- Original Message --------
Subject: hg: lambda/lambda/jdk: Replace explode() with two forms of 
flatMap: flatMap(T->Stream<U>), and flatMap(FlatMapper<T, U>)
Date: Thu, 07 Feb 2013 19:37:14 +0000
From: brian.goetz at oracle.com
To: lambda-dev at openjdk.java.net

Changeset: 3aed6b4f4d42
Author:    briangoetz
Date:      2013-02-07 14:36 -0500
URL:       http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3aed6b4f4d42

Replace explode() with two forms of flatMap: flatMap(T->Stream<U>), and 
flatMap(FlatMapper<T,U>)

! src/share/classes/java/util/stream/DoublePipeline.java
! src/share/classes/java/util/stream/DoubleStream.java
+ src/share/classes/java/util/stream/FlatMapper.java
! src/share/classes/java/util/stream/IntPipeline.java
! src/share/classes/java/util/stream/IntStream.java
! src/share/classes/java/util/stream/LongPipeline.java
! src/share/classes/java/util/stream/LongStream.java
! src/share/classes/java/util/stream/ReferencePipeline.java
! src/share/classes/java/util/stream/Stream.java
! test-ng/bootlib/java/util/stream/LambdaTestHelpers.java
! test-ng/boottests/java/util/stream/SpinedBufferTest.java
! test-ng/tests/org/openjdk/tests/java/util/stream/ExplodeOpTest.java
! test-ng/tests/org/openjdk/tests/java/util/stream/ToArrayOpTest.java
! test/java/util/LambdaUtilities.java
! test/java/util/stream/Stream/EmployeeStreamTest.java
! test/java/util/stream/Stream/IntStreamTest.java
! test/java/util/stream/Stream/IntegerStreamTest.java
! test/java/util/stream/Stream/StringBuilderStreamTest.java
! test/java/util/stream/Streams/BasicTest.java


From kevinb at google.com  Thu Feb  7 12:25:52 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 7 Feb 2013 12:25:52 -0800
Subject: Collectors update redux
In-Reply-To: <511401B2.6040003@oracle.com>
References: <5112D53F.2080205@oracle.com>
	<CA+F8eeRJq=5w2JqkKQJ0EBXQ33BOCAJdEdT=Vne=afP7AinUnw@mail.gmail.com>
	<CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>
	<511401B2.6040003@oracle.com>
Message-ID: <CAGKkBkueptrctCe7biJBxxTz4eHwSCudWtyQwGognT+yA+0_gw@mail.gmail.com>

On Thu, Feb 7, 2013 at 11:34 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

 I think there is *way* too much stuff in there, and I don't have enough
>> time to even review it all before it gets set in stone.
>>
>
> "Too much stuff here" is kind of vague.
>
> Is the concern that some of the operations (e.g., partition) are just too
> niche to carry their weight?  Or not fully baked as concepts?
>
> Or are some so obvious that we just expect people to write it themselves
> if they need it?
>
> Is the concern that there are too many forms of each operation, and that
> the user will be bewildered by the variety?
>
> Is it the complex interaction of {concurrent, ordered}?
>
> Can you point to a few examples of methods you would eliminate?  Maybe we
> can induct to a pattern from there.
>

So... This illustrates the problem I'm talking about.

You're implying "we need a specific argument to justify leaving X out" and
the further implication is that if you feel you can refute that argument,
it stays in. That's the opposite of how it works in my project ... and we
actually get to remove our mistakes later!

Did I miss all the discussions where each of the 40 (!) static Collectors
provided was carefully considered on its merits?

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Thu Feb  7 14:24:24 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 7 Feb 2013 14:24:24 -0800
Subject: Collectors update redux
In-Reply-To: <CAGKkBkueptrctCe7biJBxxTz4eHwSCudWtyQwGognT+yA+0_gw@mail.gmail.com>
References: <5112D53F.2080205@oracle.com>
	<CA+F8eeRJq=5w2JqkKQJ0EBXQ33BOCAJdEdT=Vne=afP7AinUnw@mail.gmail.com>
	<CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>
	<511401B2.6040003@oracle.com>
	<CAGKkBkueptrctCe7biJBxxTz4eHwSCudWtyQwGognT+yA+0_gw@mail.gmail.com>
Message-ID: <CAGKkBksMdeE=cLMSaBGC+fKkQijB+B51O76WJdJnJZLiYuE=vg@mail.gmail.com>

Okay, I got a presentation over with that was stressing me out and returned
to this. :-)  I think I've spoken too broadly and been unfair to a degree.
 I'll start a new thread soon with a more constructive approach.


On Thu, Feb 7, 2013 at 12:25 PM, Kevin Bourrillion <kevinb at google.com>wrote:

> On Thu, Feb 7, 2013 at 11:34 AM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>  I think there is *way* too much stuff in there, and I don't have enough
>>> time to even review it all before it gets set in stone.
>>>
>>
>> "Too much stuff here" is kind of vague.
>>
>> Is the concern that some of the operations (e.g., partition) are just too
>> niche to carry their weight?  Or not fully baked as concepts?
>>
>> Or are some so obvious that we just expect people to write it themselves
>> if they need it?
>>
>> Is the concern that there are too many forms of each operation, and that
>> the user will be bewildered by the variety?
>>
>> Is it the complex interaction of {concurrent, ordered}?
>>
>> Can you point to a few examples of methods you would eliminate?  Maybe we
>> can induct to a pattern from there.
>>
>
> So... This illustrates the problem I'm talking about.
>
> You're implying "we need a specific argument to justify leaving X out" and
> the further implication is that if you feel you can refute that argument,
> it stays in. That's the opposite of how it works in my project ... and we
> actually get to remove our mistakes later!
>
> Did I miss all the discussions where each of the 40 (!) static Collectors
> provided was carefully considered on its merits?
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu Feb  7 17:11:44 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 07 Feb 2013 20:11:44 -0500
Subject: Collectors update redux
In-Reply-To: <CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>
References: <5112D53F.2080205@oracle.com>
	<CA+F8eeRJq=5w2JqkKQJ0EBXQ33BOCAJdEdT=Vne=afP7AinUnw@mail.gmail.com>
	<CAGKkBksd8YVXFaAWeyhTimTMD_9iTubD902pZ-+HieirTT5d=Q@mail.gmail.com>
Message-ID: <511450D0.1040607@oracle.com>

>     I can think of uses for all of it, but I worry about someone faced
>     with picking the right static factory method of Collectors. Maybe
>     with the right class comment, users can be guided to the right
>     combinator without having to know much.

It's worth noting that the only method that is really needed is:

<R> R reduce(Supplier<R> factory,
              BiFunction<R, T, R> reducer,
              BinaryOperator<R> combiner);

All the other forms of reduce/collect can be written in terms of this 
one -- though some are more awkward than others.  Similarly, all the 
Collectors are just "macros" for specific combinations of inputs to this 
form of reduce.

And, as to the Collectors, groupBy can be written in terms of 
groupingReduce; partitioning is just grouping with a boolean-valued 
function; joiningWith is a form of groupingReduce too.  We don't *need* 
any of them.  They're all just reductions that can be expressed with the 
above form.

So we *could* boil everything down to just one method.  But, of course, 
we should not, because the client code gets harder to write, harder to 
read, and more error-prone.  Each "A can be written in terms of B" 
requires an "aha" that is obvious in hindsight but could well be slow in 
coming.

So it's really a question of "where do we turn the knob to."  The forms 
of reduce we've got are a (non-orthogonal) set that are (subjectively) 
tailored to specific categories of perceived-to-be common situations. 
Similarly, the set of Collectors is based on having scoured various "100 
cool examples with <my favorite query framework>" to distill out common 
use cases.  None of the Collectors add any "power" in the sense they can 
all be written as raw reduce; but they do add expressiveness.  Each one 
you take away makes some clearly imaginable use case harder.  And each 
one you add moves us closer to combinator overload.

For example, suppose we take away mapping(T->U, Collector<U>).  The user 
wants to compute "average sale by salesman".  He sees 
groupBy(Txn::seller), but that gives him a Collection<Txn>, not what he 
wants.  He sees groupBy(Txn::seller, Collector<Txn>), and he sees 
toStatistics which will give him the average/min/max he wants, but he 
can't bridge the two.  So he has to either do it in two passes, or write 
his own averaging reducer.  Which isn't terribly hard but he'd rather 
re-use the one in the library.

Adding in mapping(T->U, Collector<U>) lets him write

   .collect(groupBy(Txn::seller,
                    mapping(Txn::amount, toLongStatistics)))
   .getMean()

and be done -- and still readable -- and obviously correct.

For every single one of these, we could make the argument "we don't need 
it because it's ten lines of code the user could write if he needs" (all 
the Collectors are tiny); then again for every single one of them, we 
could make the argument that it's self-contained and useful for 
realistic use cases.

So in the end the "right" set will be highly subjective.  Personally, I 
think we've got just about the right set of operations, but maybe too 
many flavors of each.  (Note we already took away the flatMap-like 
flavors of groupBy, where each input element can be mapped to multiple 
output elements, which already cut the number of combinations in half.) 
  And maybe we could cut back on the variations (e.g., eliminate the 
forms that let you provide your own Map constructor, and you always just 
get a HashMap.)  Or maybe we have the right forms and flavors, but we 
need a more Builder-like API to regularize it.  Or maybe slicing them 
differently will be less confusing.  Or more confusing.

So, constructive input welcome!

From brian.goetz at oracle.com  Thu Feb  7 17:38:56 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 07 Feb 2013 20:38:56 -0500
Subject: Collectors update, bikeshed edition
In-Reply-To: <6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com>
References: <51084E8C.2060403@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com>
	<5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com>
	<51091B49.6040904@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com>
Message-ID: <51145730.8090409@oracle.com>

Don has been filling my mailbox daily brainstorming alternate names for 
"collect".  If we were to rename collect, the ones that seems most 
tolerable (largely on the basis of prior art) are "inject" and "fold". 
Inject is the name used by Smalltalk, Ruby, and Groovy for this.  Again, 
not that I'm claiming that this is any sort of proof of superiority, but 
some people will be familiar with the name, and that's worth something.

When I first came across "inject" I didn't like it.  Its primary value 
seemed to that it rhymed with {sel,rej,det,inf,negl}ect.  But, like 
"fold" (in the baking sense), there's a physical analogy of injecting 
ingredients one at a time into a larger entity or aggregation that 
absorbs them.  It rubs most people the wrong way at first, but you do 
get used to it, and eventually it makes sense.

Anyway, not to get Don's hopes up, but inject does have one big benefit 
over collect, and that is the challenges faced by Collector.  One of the 
bad things about the name Collector is that it *doesn't* actually 
collect things!  Instead, it is a template/recipe/scheme for *how* to 
collect things.  But we can't use the word Collection because that 
clearly means something else, and 
CollectorTemplate/CollectorStrategy/CollectorScheme all seem too 
roundabout.

But Injector works better as a name for what we now call Collector; you 
can convince yourself that a groupBy() injects data into a Map.  Or, if 
you don't like that, the space of InjectionXxx is open (unlike with 
collect), such as InjectionScheme.

I could tolerate switching to inject and some flavor of 
Injector/InjectionScheme.  I could also tolerate fold(), but that is 
more likely to engender "that's not a fold", and Folder has the same 
problem as Collection.

.NET calls this Aggregate, by the way.  And Aggregator is clear too. 
Though Doug wants us to keep Aggregate free for some future Collection 
type, and given how rabid I've been about things like syntactic 
real-estate management, I think I must reluctantly agree.


On 1/30/2013 3:42 PM, Raab, Donald wrote:
> In my opinion, collect should return a collection.  It should not reduce to any result.  In the interest of time, here's a stab at an alternative list I came up with using the powers of thesaurus yesterday:
>
> into
> gather
> assemble
> summarize
>
> The functionality currently called collect feels more like injectInto/inject in Smalltalk/Ruby/Groovy, but nothing is being injected into the method collect directly, but by the Collector (the R in makeResult()).  InjectInto/inject is the equivalent of foldLeft.  I would be less concerned over using injectInto or inject than collect, as at least it seems similar enough in that it can return any value, determined by the injector (currently called Collector).  But folks here might consider injectInto and foldLeft too cryptic, so I decided to just shorten to into in the above list.
>
> http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html#inject(groovy.lang.Closure)
>
> In the binary release I have (not sure if this is different in current source), two of the overloaded versions of the method collect create a FoldOp today (a hint), and the Collector interface has a method called accumulate and combine and is called MutableReducer in the javadoc.  The methods named reduce also create FoldOp instances.  This makes reduce and collect seem eerily similar.
>
> I find this a little confusing, but I have tried my best anyway to name that which by any other name seems to be more like injectInto/mapReduce/foldL/aggregate/etc. to me.
>
> Thoughts?
>
>>>    I will do
>>> my best and find an alternative that everyone else here likes.
>>
>> Thanks.
>>
>> -Doug
>

From david.holmes at oracle.com  Thu Feb  7 18:03:47 2013
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 08 Feb 2013 12:03:47 +1000
Subject: Collectors update, bikeshed edition
In-Reply-To: <51145730.8090409@oracle.com>
References: <51084E8C.2060403@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com>
	<5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com>
	<51091B49.6040904@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com>
	<51145730.8090409@oracle.com>
Message-ID: <51145D03.5@oracle.com>

I'm coming at this from a position of complete ignorance. I've never 
learnt functional programming though I had been exposed to some 
functional style of operations. The names collect/inject/fold are all 
equally meaningless to me. While I think I know what groupBy does I 
don't recognize it as a concrete instance of some abstract concept 
(folding, injection, aggregating etc).

So my question is, for people who will learn Java through the 
primary/traditional channels (is schools, college, university etc), 
where would they learn the underlying concepts that these API's pertain 
to? And what terminology are they most likely to encounter there?

FWIW I would much rather have a name with no obvious meaning than a name 
that I'm likely to think means something quite different to what it is. 
(unfortunately that is likely to apply to any verb we might use here.)

David

On 8/02/2013 11:38 AM, Brian Goetz wrote:
> Don has been filling my mailbox daily brainstorming alternate names for
> "collect".  If we were to rename collect, the ones that seems most
> tolerable (largely on the basis of prior art) are "inject" and "fold".
> Inject is the name used by Smalltalk, Ruby, and Groovy for this.  Again,
> not that I'm claiming that this is any sort of proof of superiority, but
> some people will be familiar with the name, and that's worth something.
>
> When I first came across "inject" I didn't like it.  Its primary value
> seemed to that it rhymed with {sel,rej,det,inf,negl}ect.  But, like
> "fold" (in the baking sense), there's a physical analogy of injecting
> ingredients one at a time into a larger entity or aggregation that
> absorbs them.  It rubs most people the wrong way at first, but you do
> get used to it, and eventually it makes sense.
>
> Anyway, not to get Don's hopes up, but inject does have one big benefit
> over collect, and that is the challenges faced by Collector.  One of the
> bad things about the name Collector is that it *doesn't* actually
> collect things!  Instead, it is a template/recipe/scheme for *how* to
> collect things.  But we can't use the word Collection because that
> clearly means something else, and
> CollectorTemplate/CollectorStrategy/CollectorScheme all seem too
> roundabout.
>
> But Injector works better as a name for what we now call Collector; you
> can convince yourself that a groupBy() injects data into a Map.  Or, if
> you don't like that, the space of InjectionXxx is open (unlike with
> collect), such as InjectionScheme.
>
> I could tolerate switching to inject and some flavor of
> Injector/InjectionScheme.  I could also tolerate fold(), but that is
> more likely to engender "that's not a fold", and Folder has the same
> problem as Collection.
>
> .NET calls this Aggregate, by the way.  And Aggregator is clear too.
> Though Doug wants us to keep Aggregate free for some future Collection
> type, and given how rabid I've been about things like syntactic
> real-estate management, I think I must reluctantly agree.
>
>
> On 1/30/2013 3:42 PM, Raab, Donald wrote:
>> In my opinion, collect should return a collection.  It should not
>> reduce to any result.  In the interest of time, here's a stab at an
>> alternative list I came up with using the powers of thesaurus yesterday:
>>
>> into
>> gather
>> assemble
>> summarize
>>
>> The functionality currently called collect feels more like
>> injectInto/inject in Smalltalk/Ruby/Groovy, but nothing is being
>> injected into the method collect directly, but by the Collector (the R
>> in makeResult()).  InjectInto/inject is the equivalent of foldLeft.  I
>> would be less concerned over using injectInto or inject than collect,
>> as at least it seems similar enough in that it can return any value,
>> determined by the injector (currently called Collector).  But folks
>> here might consider injectInto and foldLeft too cryptic, so I decided
>> to just shorten to into in the above list.
>>
>> http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html#inject(groovy.lang.Closure)
>>
>>
>> In the binary release I have (not sure if this is different in current
>> source), two of the overloaded versions of the method collect create a
>> FoldOp today (a hint), and the Collector interface has a method called
>> accumulate and combine and is called MutableReducer in the javadoc.
>> The methods named reduce also create FoldOp instances.  This makes
>> reduce and collect seem eerily similar.
>>
>> I find this a little confusing, but I have tried my best anyway to
>> name that which by any other name seems to be more like
>> injectInto/mapReduce/foldL/aggregate/etc. to me.
>>
>> Thoughts?
>>
>>>>    I will do
>>>> my best and find an alternative that everyone else here likes.
>>>
>>> Thanks.
>>>
>>> -Doug
>>

From joe.bowbeer at gmail.com  Thu Feb  7 18:29:09 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 7 Feb 2013 18:29:09 -0800
Subject: Collectors update, bikeshed edition
In-Reply-To: <51145D03.5@oracle.com>
References: <51084E8C.2060403@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com>
	<5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com>
	<51091B49.6040904@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com>
	<51145730.8090409@oracle.com> <51145D03.5@oracle.com>
Message-ID: <CAHzJPEroAfuqsesdR8WZip+pV5GAdhcVL1WcaMJi=45CrT0j1Q@mail.gmail.com>

David, I doubt there is a clear answer to your question, though I would
point to Ruby, Python and Groovy, in some order, for guidance as to what
Java programming students may have been exposed to.

This is an interesting take on the "inject" name in Ruby:

http://railspikes.com/2008/8/11/understanding-map-and-reduce

inject, reduce, and fold have all been used for "reduce" in Ruby.  inject
is apparently from Smalltalk?  I like the section "Why 3+ names".

My preferences are guided by how the simple examples "read", and I like the
way "collect" reads in the Anagrams example:

  public static Stream<Collection<String>> anagrams(Stream<String> words) {
    return words.parallel().collectUnordered(groupingBy(Anagrams::key))
          .values().parallelStream().filter(v -> v.size() > 1);
  }

Based on my own experience learning this complicated API, I think the best
approach for teaching these methods will be heavily reliant on a cookbook
of simple examples.

--Joe


On Thu, Feb 7, 2013 at 6:03 PM, David Holmes <david.holmes at oracle.com>wrote:

> I'm coming at this from a position of complete ignorance. I've never
> learnt functional programming though I had been exposed to some functional
> style of operations. The names collect/inject/fold are all equally
> meaningless to me. While I think I know what groupBy does I don't recognize
> it as a concrete instance of some abstract concept (folding, injection,
> aggregating etc).
>
> So my question is, for people who will learn Java through the
> primary/traditional channels (is schools, college, university etc), where
> would they learn the underlying concepts that these API's pertain to? And
> what terminology are they most likely to encounter there?
>
> FWIW I would much rather have a name with no obvious meaning than a name
> that I'm likely to think means something quite different to what it is.
> (unfortunately that is likely to apply to any verb we might use here.)
>
> David
>
>
> On 8/02/2013 11:38 AM, Brian Goetz wrote:
>
>> Don has been filling my mailbox daily brainstorming alternate names for
>> "collect".  If we were to rename collect, the ones that seems most
>> tolerable (largely on the basis of prior art) are "inject" and "fold".
>> Inject is the name used by Smalltalk, Ruby, and Groovy for this.  Again,
>> not that I'm claiming that this is any sort of proof of superiority, but
>> some people will be familiar with the name, and that's worth something.
>>
>> When I first came across "inject" I didn't like it.  Its primary value
>> seemed to that it rhymed with {sel,rej,det,inf,negl}ect.  But, like
>> "fold" (in the baking sense), there's a physical analogy of injecting
>> ingredients one at a time into a larger entity or aggregation that
>> absorbs them.  It rubs most people the wrong way at first, but you do
>> get used to it, and eventually it makes sense.
>>
>> Anyway, not to get Don's hopes up, but inject does have one big benefit
>> over collect, and that is the challenges faced by Collector.  One of the
>> bad things about the name Collector is that it *doesn't* actually
>> collect things!  Instead, it is a template/recipe/scheme for *how* to
>> collect things.  But we can't use the word Collection because that
>> clearly means something else, and
>> CollectorTemplate/**CollectorStrategy/**CollectorScheme all seem too
>> roundabout.
>>
>> But Injector works better as a name for what we now call Collector; you
>> can convince yourself that a groupBy() injects data into a Map.  Or, if
>> you don't like that, the space of InjectionXxx is open (unlike with
>> collect), such as InjectionScheme.
>>
>> I could tolerate switching to inject and some flavor of
>> Injector/InjectionScheme.  I could also tolerate fold(), but that is
>> more likely to engender "that's not a fold", and Folder has the same
>> problem as Collection.
>>
>> .NET calls this Aggregate, by the way.  And Aggregator is clear too.
>> Though Doug wants us to keep Aggregate free for some future Collection
>> type, and given how rabid I've been about things like syntactic
>> real-estate management, I think I must reluctantly agree.
>>
>>
>> On 1/30/2013 3:42 PM, Raab, Donald wrote:
>>
>>> In my opinion, collect should return a collection.  It should not
>>> reduce to any result.  In the interest of time, here's a stab at an
>>> alternative list I came up with using the powers of thesaurus yesterday:
>>>
>>> into
>>> gather
>>> assemble
>>> summarize
>>>
>>> The functionality currently called collect feels more like
>>> injectInto/inject in Smalltalk/Ruby/Groovy, but nothing is being
>>> injected into the method collect directly, but by the Collector (the R
>>> in makeResult()).  InjectInto/inject is the equivalent of foldLeft.  I
>>> would be less concerned over using injectInto or inject than collect,
>>> as at least it seems similar enough in that it can return any value,
>>> determined by the injector (currently called Collector).  But folks
>>> here might consider injectInto and foldLeft too cryptic, so I decided
>>> to just shorten to into in the above list.
>>>
>>> http://groovy.codehaus.org/**groovy-jdk/java/util/**
>>> Collection.html#inject(groovy.**lang.Closure)<http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html#inject(groovy.lang.Closure)>
>>>
>>>
>>> In the binary release I have (not sure if this is different in current
>>> source), two of the overloaded versions of the method collect create a
>>> FoldOp today (a hint), and the Collector interface has a method called
>>> accumulate and combine and is called MutableReducer in the javadoc.
>>> The methods named reduce also create FoldOp instances.  This makes
>>> reduce and collect seem eerily similar.
>>>
>>> I find this a little confusing, but I have tried my best anyway to
>>> name that which by any other name seems to be more like
>>> injectInto/mapReduce/foldL/**aggregate/etc. to me.
>>>
>>> Thoughts?
>>>
>>>     I will do
>>>>> my best and find an alternative that everyone else here likes.
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>> -Doug
>>>>
>>>
>>>

From brian.goetz at oracle.com  Thu Feb  7 19:09:32 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 07 Feb 2013 22:09:32 -0500
Subject: Collectors update, bikeshed edition
In-Reply-To: <51145D03.5@oracle.com>
References: <51084E8C.2060403@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com>
	<5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com>
	<51091B49.6040904@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com>
	<51145730.8090409@oracle.com> <51145D03.5@oracle.com>
Message-ID: <51146C6C.6070001@oracle.com>

> So my question is, for people who will learn Java through the
> primary/traditional channels (is schools, college, university etc),
> where would they learn the underlying concepts that these API's pertain
> to? And what terminology are they most likely to encounter there?

Thanks David, for bringing us back to the primary challenge here: 
pedagogical.  Obviously we will do what we can in Javadoc (though 
haven't done so yet), but ultimately this will only scratch the surface. 
  (Just as the Javadoc for JUC only scratched the surface for 
concurrency concepts.  And we all know where that led.)

> FWIW I would much rather have a name with no obvious meaning than a name
> that I'm likely to think means something quite different to what it is.
> (unfortunately that is likely to apply to any verb we might use here.)

Right, so "inject" and "grobulate" are equally good by that metric :)


From david.holmes at oracle.com  Thu Feb  7 19:37:58 2013
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 08 Feb 2013 13:37:58 +1000
Subject: Collectors update, bikeshed edition
In-Reply-To: <51146C6C.6070001@oracle.com>
References: <51084E8C.2060403@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com>
	<5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com>
	<51091B49.6040904@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com>
	<51145730.8090409@oracle.com> <51145D03.5@oracle.com>
	<51146C6C.6070001@oracle.com>
Message-ID: <51147316.1050107@oracle.com>

On 8/02/2013 1:09 PM, Brian Goetz wrote:
>> So my question is, for people who will learn Java through the
>> primary/traditional channels (is schools, college, university etc),
>> where would they learn the underlying concepts that these API's pertain
>> to? And what terminology are they most likely to encounter there?
>
> Thanks David, for bringing us back to the primary challenge here:
> pedagogical.  Obviously we will do what we can in Javadoc (though
> haven't done so yet), but ultimately this will only scratch the surface.
>   (Just as the Javadoc for JUC only scratched the surface for
> concurrency concepts.  And we all know where that led.)

I may be biased but I think we had the easier job with j.u.c

>> FWIW I would much rather have a name with no obvious meaning than a name
>> that I'm likely to think means something quite different to what it is.
>> (unfortunately that is likely to apply to any verb we might use here.)
>
> Right, so "inject" and "grobulate" are equally good by that metric :)

No, I have various notions of inject/injection - and after reading the 
link Joe posted (thanks Joe!) and some references therefrom, the 
relationship between inject and actually injecting something seems so 
tangential to the real functionality that it is obviously a terrible name.

grobulate I quite like. ;-)

David


From kevinb at google.com  Thu Feb  7 19:44:19 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 7 Feb 2013 19:44:19 -0800
Subject: Collectors update, bikeshed edition
In-Reply-To: <51145730.8090409@oracle.com>
References: <51084E8C.2060403@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com>
	<5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com>
	<51091B49.6040904@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com>
	<51145730.8090409@oracle.com>
Message-ID: <CAGKkBkvo4yKktT5CVPd1Cvx3OFCggZYEgbDifO1fA1BZUJe89g@mail.gmail.com>

I come from an environment where usage of @javax.inject.Inject is just
utterly ubiquitous, so I really *can't* like inject() for this.


On Thu, Feb 7, 2013 at 5:38 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Don has been filling my mailbox daily brainstorming alternate names for
> "collect".  If we were to rename collect, the ones that seems most
> tolerable (largely on the basis of prior art) are "inject" and "fold".
> Inject is the name used by Smalltalk, Ruby, and Groovy for this.  Again,
> not that I'm claiming that this is any sort of proof of superiority, but
> some people will be familiar with the name, and that's worth something.
>
> When I first came across "inject" I didn't like it.  Its primary value
> seemed to that it rhymed with {sel,rej,det,inf,negl}ect.  But, like "fold"
> (in the baking sense), there's a physical analogy of injecting ingredients
> one at a time into a larger entity or aggregation that absorbs them.  It
> rubs most people the wrong way at first, but you do get used to it, and
> eventually it makes sense.
>
> Anyway, not to get Don's hopes up, but inject does have one big benefit
> over collect, and that is the challenges faced by Collector.  One of the
> bad things about the name Collector is that it *doesn't* actually collect
> things!  Instead, it is a template/recipe/scheme for *how* to collect
> things.  But we can't use the word Collection because that clearly means
> something else, and CollectorTemplate/**CollectorStrategy/**CollectorScheme
> all seem too roundabout.
>
> But Injector works better as a name for what we now call Collector; you
> can convince yourself that a groupBy() injects data into a Map.  Or, if you
> don't like that, the space of InjectionXxx is open (unlike with collect),
> such as InjectionScheme.
>
> I could tolerate switching to inject and some flavor of
> Injector/InjectionScheme.  I could also tolerate fold(), but that is more
> likely to engender "that's not a fold", and Folder has the same problem as
> Collection.
>
> .NET calls this Aggregate, by the way.  And Aggregator is clear too.
> Though Doug wants us to keep Aggregate free for some future Collection
> type, and given how rabid I've been about things like syntactic real-estate
> management, I think I must reluctantly agree.
>
>
> On 1/30/2013 3:42 PM, Raab, Donald wrote:
>
>> In my opinion, collect should return a collection.  It should not reduce
>> to any result.  In the interest of time, here's a stab at an alternative
>> list I came up with using the powers of thesaurus yesterday:
>>
>> into
>> gather
>> assemble
>> summarize
>>
>> The functionality currently called collect feels more like
>> injectInto/inject in Smalltalk/Ruby/Groovy, but nothing is being injected
>> into the method collect directly, but by the Collector (the R in
>> makeResult()).  InjectInto/inject is the equivalent of foldLeft.  I would
>> be less concerned over using injectInto or inject than collect, as at least
>> it seems similar enough in that it can return any value, determined by the
>> injector (currently called Collector).  But folks here might consider
>> injectInto and foldLeft too cryptic, so I decided to just shorten to into
>> in the above list.
>>
>> http://groovy.codehaus.org/**groovy-jdk/java/util/**
>> Collection.html#inject(groovy.**lang.Closure)<http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html#inject(groovy.lang.Closure)>
>>
>> In the binary release I have (not sure if this is different in current
>> source), two of the overloaded versions of the method collect create a
>> FoldOp today (a hint), and the Collector interface has a method called
>> accumulate and combine and is called MutableReducer in the javadoc.  The
>> methods named reduce also create FoldOp instances.  This makes reduce and
>> collect seem eerily similar.
>>
>> I find this a little confusing, but I have tried my best anyway to name
>> that which by any other name seems to be more like
>> injectInto/mapReduce/foldL/**aggregate/etc. to me.
>>
>> Thoughts?
>>
>>     I will do
>>>> my best and find an alternative that everyone else here likes.
>>>>
>>>
>>> Thanks.
>>>
>>> -Doug
>>>
>>
>>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From Donald.Raab at gs.com  Thu Feb  7 19:49:36 2013
From: Donald.Raab at gs.com (Raab, Donald)
Date: Thu, 7 Feb 2013 22:49:36 -0500
Subject: Collectors update, bikeshed edition
In-Reply-To: <51145D03.5@oracle.com>
References: <51084E8C.2060403@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A88DCE@GSCMAMP09EX.firmwide.corp.gs.com>
	<5108635F.5030701@cs.oswego.edu> <51086AD8.5090309@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88DDA@GSCMAMP09EX.firmwide.corp.gs.com>
	<51091B49.6040904@cs.oswego.edu>
	<6712820CB52CFB4D842561213A77C05404C3A88E74@GSCMAMP09EX.firmwide.corp.gs.com>
	<51145730.8090409@oracle.com> <51145D03.5@oracle.com>
Message-ID: <6712820CB52CFB4D842561213A77C05404C3A8932D@GSCMAMP09EX.firmwide.corp.gs.com>

> So my question is, for people who will learn Java through the
> primary/traditional channels (is schools, college, university etc),
> where would they learn the underlying concepts that these API's pertain
> to? And what terminology are they most likely to encounter there?

Many developers may learn them through a combination of Google search & StackOverflow (often found through Google).

http://stackoverflow.com/questions/10875607/comprehensive-list-of-synonyms-for-reduce/10919742#10919742


From brian.goetz at oracle.com  Fri Feb  8 07:25:05 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 10:25:05 -0500
Subject: Refactor of Collector interface
Message-ID: <511518D1.5050706@oracle.com>

FYI: In a recent refactoring, I changed:

public interface Collector<T, R> {
     R makeResult();
     void accumulate(R result, T value);
     R combine(R result, R other);
}

to

public interface Collector<T, R> {
     Supplier<R> resultSupplier();
     BiConsumer<R, T> accumulator();
     BinaryOperator<R> combiner();
}

Basically, this is a refactoring from typical interface to 
tuple-of-lambdas.  What I found was that there was a lot of adaptation 
going on, where something would start out as a lambda, we'd wrap it with 
a Collector whose method invoked the lambda, then take a method 
reference to that wrapping method and then later wrap that with another 
Collector, etc.  By keeping access to the functions directly, the 
Collectors code got simpler and less wrappy, since a lot of functions 
could just be passed right through without wrapping.  And a lot of 
stupid adapter classes went away.

While clearly we don't want all interfaces to evolve this way, this is 
one where *all* the many layers of manipulations are effectively 
function composition, and exposing the function-ness made that cleaner 
and more performant.  So while I don't feel completely super-great about 
it, I think its enough of a win to keep.


From tim at peierls.net  Fri Feb  8 07:31:08 2013
From: tim at peierls.net (Tim Peierls)
Date: Fri, 8 Feb 2013 10:31:08 -0500
Subject: Refactor of Collector interface
In-Reply-To: <511518D1.5050706@oracle.com>
References: <511518D1.5050706@oracle.com>
Message-ID: <CA+F8eeR-BHmb7KRKLdN1Rm784+_mW5gPD_GszeKL2r-WcK22ow@mail.gmail.com>

That's a good change. You don't need to defend it as a special case,
though: I think it's actually clearer the new way.

--tim

On Fri, Feb 8, 2013 at 10:25 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> FYI: In a recent refactoring, I changed:
>
> public interface Collector<T, R> {
>     R makeResult();
>     void accumulate(R result, T value);
>     R combine(R result, R other);
> }
>
> to
>
> public interface Collector<T, R> {
>     Supplier<R> resultSupplier();
>     BiConsumer<R, T> accumulator();
>     BinaryOperator<R> combiner();
> }
>
> Basically, this is a refactoring from typical interface to
> tuple-of-lambdas.  What I found was that there was a lot of adaptation
> going on, where something would start out as a lambda, we'd wrap it with a
> Collector whose method invoked the lambda, then take a method reference to
> that wrapping method and then later wrap that with another Collector, etc.
>  By keeping access to the functions directly, the Collectors code got
> simpler and less wrappy, since a lot of functions could just be passed
> right through without wrapping.  And a lot of stupid adapter classes went
> away.
>
> While clearly we don't want all interfaces to evolve this way, this is one
> where *all* the many layers of manipulations are effectively function
> composition, and exposing the function-ness made that cleaner and more
> performant.  So while I don't feel completely super-great about it, I think
> its enough of a win to keep.
>
>

From brian.goetz at oracle.com  Fri Feb  8 07:47:17 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 10:47:17 -0500
Subject: explode
In-Reply-To: <CAGKkBkvTjtcrNvNXMSbYaCMDe6xYL8uCWRxEkjZEUXjTziqk_w@mail.gmail.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CAGKkBkvTjtcrNvNXMSbYaCMDe6xYL8uCWRxEkjZEUXjTziqk_w@mail.gmail.com>
Message-ID: <51151E05.50908@oracle.com>

OK, just to put it all down on "paper" where flatMap landed...are we OK 
with this?

java.util.stream.FlatMapper:

public interface FlatMapper<T, U> {
     void explodeInto(T element, Consumer<U> sink);

     interface ToInt<T> {
         void explodeInto(T element, IntConsumer sink);
     }

     interface ToLong<T> {
         void explodeInto(T element, LongConsumer sink);
     }

     interface ToDouble<T> {
         void explodeInto(T element, DoubleConsumer sink);
     }

     interface OfIntToInt {
         void explodeInto(int element, IntConsumer sink);
     }

     interface OfLongToLong {
         void explodeInto(long element, LongConsumer sink);
     }

     interface OfDoubleToDouble {
         void explodeInto(double element, DoubleConsumer sink);
     }
}

In Stream:

     <R> Stream<R> flatMap(Function<T, Stream<? extends R>> mapper);

     <R> Stream<R> flatMap(FlatMapper<? super T, R> mapper);

     IntStream flatMap(FlatMapper.ToInt<? super T> mapper);
     LongStream flatMap(FlatMapper.ToLong<? super T> mapper);
     DoubleStream flatMap(FlatMapper.ToDouble<? super T> mapper);

In IntStream (similar for {Double,Long}Stream):

     IntStream flatMap(IntFunction<? extends IntStream> mapper);
     IntStream flatMap(FlatMapper.OfIntToInt mapper);


And Remi wants one more static helper method in FlatMap:

  public static <T, U> FlatMapper<T, U>
  explodeCollection(Function<? super T, ? extends Collection<? extends 
U>> function)


I think this wraps up the explosive section of our program?


On 2/6/2013 7:05 PM, Kevin Bourrillion wrote:
> On Wed, Feb 6, 2013 at 3:30 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>        Stream<U> flatMap(FlatMapper<T, U>)
>
>        Stream<U> flatMap(Function<T, Stream<U>>)
>
>
> To make sure I understand: would these two behave identically? Would
> they imaginably perform comparably?
>
>      foos.stream().flatMap((t, consumer) ->
> t.somethingThatGivesAStream().forEach(consumer))
>      foos.stream().flatMap(t -> t.somethingThatGivesAStream())
>
> Second question, why "FlatMapper.OfInt" here, but "IntSupplier" etc.
> elsewhere?
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>

From kevinb at google.com  Fri Feb  8 08:35:06 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 08:35:06 -0800
Subject: Refactor of Collector interface
In-Reply-To: <CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
References: <511518D1.5050706@oracle.com>
	<CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
Message-ID: <CAGKkBksgvq2dJWzBY_S5y2aXJTeH5Upv-rNCR-C_TcxhTAXqxg@mail.gmail.com>

Hmm, it's difficult for me to perceive what these benefits are from looking
at the change to
Collectors.java<http://hg.openjdk.java.net/lambda/lambda/jdk/diff/221c5b4f706c/src/share/classes/java/util/stream/Collectors.java>,
and the file did get 70 lines longer as a result of the change fwiw, and
seems to rely more on private abstract base classes that other Collector
implementors won't have.

(How do you get to side-by-side diff in this thing? I feel quite blind
without it and am thus stuck in "I don't get it" mode.)


On Fri, Feb 8, 2013 at 8:22 AM, Kevin Bourrillion <kevinb at google.com> wrote:

> My subjective sense of good Java API design very strongly prefers the
> "before" picture here, which I see as a lot more "Java-like", so I'm taking
> a closer look.
>
> I assume that the trade-offs we're weighing here are purely to do with
> what it's like to be a Collector implementor, correct?
>
>
> On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> FYI: In a recent refactoring, I changed:
>>
>> public interface Collector<T, R> {
>>     R makeResult();
>>     void accumulate(R result, T value);
>>     R combine(R result, R other);
>> }
>>
>> to
>>
>> public interface Collector<T, R> {
>>     Supplier<R> resultSupplier();
>>     BiConsumer<R, T> accumulator();
>>     BinaryOperator<R> combiner();
>> }
>>
>> Basically, this is a refactoring from typical interface to
>> tuple-of-lambdas.  What I found was that there was a lot of adaptation
>> going on, where something would start out as a lambda, we'd wrap it with a
>> Collector whose method invoked the lambda, then take a method reference to
>> that wrapping method and then later wrap that with another Collector, etc.
>>  By keeping access to the functions directly, the Collectors code got
>> simpler and less wrappy, since a lot of functions could just be passed
>> right through without wrapping.  And a lot of stupid adapter classes went
>> away.
>>
>> While clearly we don't want all interfaces to evolve this way, this is
>> one where *all* the many layers of manipulations are effectively function
>> composition, and exposing the function-ness made that cleaner and more
>> performant.  So while I don't feel completely super-great about it, I think
>> its enough of a win to keep.
>>
>>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Fri Feb  8 08:36:05 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 11:36:05 -0500
Subject: Refactor of Collector interface
In-Reply-To: <CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
References: <511518D1.5050706@oracle.com>
	<CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
Message-ID: <51152975.1040305@oracle.com>

Your subjective sense is accurate, which is why I brought this up.  This 
may be an example where is better to depart from the traditional approach.

To your question, it depends what you mean by "purely to do with an 
implementor."  Collector *users* are going to be burdened with the 
performance consequences of multiple layers of wrapping/conversion.

The implementation used to be full of alternation between:

interface Foo<T,U> {
     U transform(T t);
}

class FooAdapter<T,U> {
     FooAdapter(Function<T,U> lambda) { ... }

     U transform(T t) { return lambda.apply(t); }
}

and

Function<T,U> parentTransformer = foo::transform;

and back again, introducing layers of wrapping even when the function is 
not changing across layers.


On 2/8/2013 11:22 AM, Kevin Bourrillion wrote:
> My subjective sense of good Java API design very strongly prefers the
> "before" picture here, which I see as a lot more "Java-like", so I'm
> taking a closer look.
>
> I assume that the trade-offs we're weighing here are purely to do with
> what it's like to be a Collector implementor, correct?
>
>
> On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     FYI: In a recent refactoring, I changed:
>
>     public interface Collector<T, R> {
>          R makeResult();
>          void accumulate(R result, T value);
>          R combine(R result, R other);
>     }
>
>     to
>
>     public interface Collector<T, R> {
>          Supplier<R> resultSupplier();
>          BiConsumer<R, T> accumulator();
>          BinaryOperator<R> combiner();
>     }
>
>     Basically, this is a refactoring from typical interface to
>     tuple-of-lambdas.  What I found was that there was a lot of
>     adaptation going on, where something would start out as a lambda,
>     we'd wrap it with a Collector whose method invoked the lambda, then
>     take a method reference to that wrapping method and then later wrap
>     that with another Collector, etc.  By keeping access to the
>     functions directly, the Collectors code got simpler and less wrappy,
>     since a lot of functions could just be passed right through without
>     wrapping.  And a lot of stupid adapter classes went away.
>
>     While clearly we don't want all interfaces to evolve this way, this
>     is one where *all* the many layers of manipulations are effectively
>     function composition, and exposing the function-ness made that
>     cleaner and more performant.  So while I don't feel completely
>     super-great about it, I think its enough of a win to keep.
>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>

From kevinb at google.com  Fri Feb  8 08:22:00 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 08:22:00 -0800
Subject: Refactor of Collector interface
In-Reply-To: <511518D1.5050706@oracle.com>
References: <511518D1.5050706@oracle.com>
Message-ID: <CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>

My subjective sense of good Java API design very strongly prefers the
"before" picture here, which I see as a lot more "Java-like", so I'm taking
a closer look.

I assume that the trade-offs we're weighing here are purely to do with what
it's like to be a Collector implementor, correct?


On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> FYI: In a recent refactoring, I changed:
>
> public interface Collector<T, R> {
>     R makeResult();
>     void accumulate(R result, T value);
>     R combine(R result, R other);
> }
>
> to
>
> public interface Collector<T, R> {
>     Supplier<R> resultSupplier();
>     BiConsumer<R, T> accumulator();
>     BinaryOperator<R> combiner();
> }
>
> Basically, this is a refactoring from typical interface to
> tuple-of-lambdas.  What I found was that there was a lot of adaptation
> going on, where something would start out as a lambda, we'd wrap it with a
> Collector whose method invoked the lambda, then take a method reference to
> that wrapping method and then later wrap that with another Collector, etc.
>  By keeping access to the functions directly, the Collectors code got
> simpler and less wrappy, since a lot of functions could just be passed
> right through without wrapping.  And a lot of stupid adapter classes went
> away.
>
> While clearly we don't want all interfaces to evolve this way, this is one
> where *all* the many layers of manipulations are effectively function
> composition, and exposing the function-ness made that cleaner and more
> performant.  So while I don't feel completely super-great about it, I think
> its enough of a win to keep.
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Fri Feb  8 08:39:46 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 11:39:46 -0500
Subject: Refactor of Collector interface
In-Reply-To: <CAGKkBksgvq2dJWzBY_S5y2aXJTeH5Upv-rNCR-C_TcxhTAXqxg@mail.gmail.com>
References: <511518D1.5050706@oracle.com>
	<CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
	<CAGKkBksgvq2dJWzBY_S5y2aXJTeH5Upv-rNCR-C_TcxhTAXqxg@mail.gmail.com>
Message-ID: <51152A52.8020200@oracle.com>

> Hmm, it's difficult for me to perceive what these benefits are from
> looking at the change to Collectors.java
> <http://hg.openjdk.java.net/lambda/lambda/jdk/diff/221c5b4f706c/src/share/classes/java/util/stream/Collectors.java>,
> and the file did get 70 lines longer as a result of the change fwiw, and
> seems to rely more on private abstract base classes that other Collector
> implementors won't have.

It actually got smaller when this transform was applied, but more stuff 
went into Collectors in the same changeset, such as the mapped() 
combinators.

I have no objection to making that abstract base class public if that's 
a concern, though it's not really necessary since Collector writers can 
do without it:

class FooCollector implements Collector {
     Supplier<R> resultSupplier() { return Foo:: new; }
     ...
}

The abstract class is mostly there as a "fake tuple" class for 
convenience of the Collectors implementation, and I think we're on 
record as saying that it is reasonable to expect users to write their 
own fake tuple classes.

> (How do you get to side-by-side diff in this thing? I feel quite blind
> without it and am thus stuck in "I don't get it" mode.)
>
>
> On Fri, Feb 8, 2013 at 8:22 AM, Kevin Bourrillion <kevinb at google.com
> <mailto:kevinb at google.com>> wrote:
>
>     My subjective sense of good Java API design very strongly prefers
>     the "before" picture here, which I see as a lot more "Java-like", so
>     I'm taking a closer look.
>
>     I assume that the trade-offs we're weighing here are purely to do
>     with what it's like to be a Collector implementor, correct?
>
>
>     On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz <brian.goetz at oracle.com
>     <mailto:brian.goetz at oracle.com>> wrote:
>
>         FYI: In a recent refactoring, I changed:
>
>         public interface Collector<T, R> {
>              R makeResult();
>              void accumulate(R result, T value);
>              R combine(R result, R other);
>         }
>
>         to
>
>         public interface Collector<T, R> {
>              Supplier<R> resultSupplier();
>              BiConsumer<R, T> accumulator();
>              BinaryOperator<R> combiner();
>         }
>
>         Basically, this is a refactoring from typical interface to
>         tuple-of-lambdas.  What I found was that there was a lot of
>         adaptation going on, where something would start out as a
>         lambda, we'd wrap it with a Collector whose method invoked the
>         lambda, then take a method reference to that wrapping method and
>         then later wrap that with another Collector, etc.  By keeping
>         access to the functions directly, the Collectors code got
>         simpler and less wrappy, since a lot of functions could just be
>         passed right through without wrapping.  And a lot of stupid
>         adapter classes went away.
>
>         While clearly we don't want all interfaces to evolve this way,
>         this is one where *all* the many layers of manipulations are
>         effectively function composition, and exposing the function-ness
>         made that cleaner and more performant.  So while I don't feel
>         completely super-great about it, I think its enough of a win to
>         keep.
>
>
>
>
>     --
>     Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>     <mailto:kevinb at google.com>
>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>

From kevinb at google.com  Fri Feb  8 08:43:35 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 08:43:35 -0800
Subject: Refactor of Collector interface
In-Reply-To: <51152975.1040305@oracle.com>
References: <511518D1.5050706@oracle.com>
	<CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
	<51152975.1040305@oracle.com>
Message-ID: <CAGKkBkuR8hrmXCmucna1HpxeA_GidihaGODnKRy2-B+oVj4gXQ@mail.gmail.com>

Oh, it's about performance. I see that now.

Well, if it's possible to just tell us, "Hey, a group-by of 10000 elements
used to incur N bytes of garbage and now causes only M," that's very easy
to know how to react to.


On Fri, Feb 8, 2013 at 8:36 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Your subjective sense is accurate, which is why I brought this up.  This
> may be an example where is better to depart from the traditional approach.
>
> To your question, it depends what you mean by "purely to do with an
> implementor."  Collector *users* are going to be burdened with the
> performance consequences of multiple layers of wrapping/conversion.
>
> The implementation used to be full of alternation between:
>
> interface Foo<T,U> {
>     U transform(T t);
> }
>
> class FooAdapter<T,U> {
>     FooAdapter(Function<T,U> lambda) { ... }
>
>     U transform(T t) { return lambda.apply(t); }
> }
>
> and
>
> Function<T,U> parentTransformer = foo::transform;
>
> and back again, introducing layers of wrapping even when the function is
> not changing across layers.
>
>
>
>
> On 2/8/2013 11:22 AM, Kevin Bourrillion wrote:
>
>> My subjective sense of good Java API design very strongly prefers the
>> "before" picture here, which I see as a lot more "Java-like", so I'm
>> taking a closer look.
>>
>> I assume that the trade-offs we're weighing here are purely to do with
>> what it's like to be a Collector implementor, correct?
>>
>>
>> On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>     FYI: In a recent refactoring, I changed:
>>
>>     public interface Collector<T, R> {
>>          R makeResult();
>>          void accumulate(R result, T value);
>>          R combine(R result, R other);
>>     }
>>
>>     to
>>
>>     public interface Collector<T, R> {
>>          Supplier<R> resultSupplier();
>>          BiConsumer<R, T> accumulator();
>>          BinaryOperator<R> combiner();
>>     }
>>
>>     Basically, this is a refactoring from typical interface to
>>     tuple-of-lambdas.  What I found was that there was a lot of
>>     adaptation going on, where something would start out as a lambda,
>>     we'd wrap it with a Collector whose method invoked the lambda, then
>>     take a method reference to that wrapping method and then later wrap
>>     that with another Collector, etc.  By keeping access to the
>>     functions directly, the Collectors code got simpler and less wrappy,
>>     since a lot of functions could just be passed right through without
>>     wrapping.  And a lot of stupid adapter classes went away.
>>
>>     While clearly we don't want all interfaces to evolve this way, this
>>     is one where *all* the many layers of manipulations are effectively
>>     function composition, and exposing the function-ness made that
>>     cleaner and more performant.  So while I don't feel completely
>>     super-great about it, I think its enough of a win to keep.
>>
>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>
>>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From tim at peierls.net  Fri Feb  8 09:13:09 2013
From: tim at peierls.net (Tim Peierls)
Date: Fri, 8 Feb 2013 12:13:09 -0500
Subject: Refactor of Collector interface
In-Reply-To: <CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
References: <511518D1.5050706@oracle.com>
	<CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
Message-ID: <CA+F8eeR+4ydfNPomn3N0XAPbYZM1LTVLiLGgjx3mgPWbYsDAHA@mail.gmail.com>

On Fri, Feb 8, 2013 at 11:22 AM, Kevin Bourrillion <kevinb at google.com>wrote:

> My subjective sense of good Java API design very strongly prefers the
> "before" picture here, which I see as a lot more "Java-like", so I'm taking
> a closer look.


The before picture is certainly more pre-lambda-Java-like, but I don't
think it's fair to knock something meant to fit well with a new language
feature by those rules.

I thought the return types of the after picture conveyed more clearly the
idea of "I'm going to need a way to supply result objects, and way to
accumulate elements into result objects, and a way to combine result
objects."  And seeing those interface types as return types reinforced my
understanding of those types.


I assume that the trade-offs we're weighing here are purely to do with what
> it's like to be a Collector implementor, correct?
>

Well, since I persist in preferring the after picture -- maybe the
impending blizzard has addled my senses -- I'd say the benefit to Collector
implementers is secondary.

--tim

From kevinb at google.com  Fri Feb  8 09:30:45 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 09:30:45 -0800
Subject: Refactor of Collector interface
In-Reply-To: <CA+F8eeR+4ydfNPomn3N0XAPbYZM1LTVLiLGgjx3mgPWbYsDAHA@mail.gmail.com>
References: <511518D1.5050706@oracle.com>
	<CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
	<CA+F8eeR+4ydfNPomn3N0XAPbYZM1LTVLiLGgjx3mgPWbYsDAHA@mail.gmail.com>
Message-ID: <CAGKkBks_uBTLf=SQxTD5Qg7qPEm4CY2DThBeU1jkH3t+CmGZZA@mail.gmail.com>

On Fri, Feb 8, 2013 at 9:13 AM, Tim Peierls <tim at peierls.net> wrote:

My subjective sense of good Java API design very strongly prefers the
>> "before" picture here, which I see as a lot more "Java-like", so I'm taking
>> a closer look.
>
>
> The before picture is certainly more pre-lambda-Java-like, but I don't
> think it's fair to knock something meant to fit well with a new language
> feature by those rules.
>

I think I'm only really saying the same thing Brian is when he says "While
clearly we don't want all interfaces to evolve this way..." and "while I
don't feel completely super-great about it....", etc.

I'd prefer to not rely on the taste argument if we can treat the benefits
concretely.


>
> I thought the return types of the after picture conveyed more clearly the
> idea of "I'm going to need a way to supply result objects, and way to
> accumulate elements into result objects, and a way to combine result
> objects."  And seeing those interface types as return types reinforced my
> understanding of those types.
>
>
> I assume that the trade-offs we're weighing here are purely to do with
>> what it's like to be a Collector implementor, correct?
>>
>
> Well, since I persist in preferring the after picture -- maybe the
> impending blizzard has addled my senses -- I'd say the benefit to Collector
> implementers is secondary.
>
> --tim
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From tim at peierls.net  Fri Feb  8 10:41:22 2013
From: tim at peierls.net (Tim Peierls)
Date: Fri, 8 Feb 2013 13:41:22 -0500
Subject: Refactor of Collector interface
In-Reply-To: <CAGKkBks_uBTLf=SQxTD5Qg7qPEm4CY2DThBeU1jkH3t+CmGZZA@mail.gmail.com>
References: <511518D1.5050706@oracle.com>
	<CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
	<CA+F8eeR+4ydfNPomn3N0XAPbYZM1LTVLiLGgjx3mgPWbYsDAHA@mail.gmail.com>
	<CAGKkBks_uBTLf=SQxTD5Qg7qPEm4CY2DThBeU1jkH3t+CmGZZA@mail.gmail.com>
Message-ID: <CA+F8eeR_nJZgcL2mm88ctF+eRx2kjj5LTYLgZH2_euF_z+a-HQ@mail.gmail.com>

OK, throwing away the taste argument. And I don't feel completely
super-great about anything, so I'm right there.

On Fri, Feb 8, 2013 at 12:30 PM, Kevin Bourrillion <kevinb at google.com>wrote:

> On Fri, Feb 8, 2013 at 9:13 AM, Tim Peierls <tim at peierls.net> wrote:
>
> My subjective sense of good Java API design very strongly prefers the
>>> "before" picture here, which I see as a lot more "Java-like", so I'm taking
>>> a closer look.
>>
>>
>> The before picture is certainly more pre-lambda-Java-like, but I don't
>> think it's fair to knock something meant to fit well with a new language
>> feature by those rules.
>>
>
> I think I'm only really saying the same thing Brian is when he says "While
> clearly we don't want all interfaces to evolve this way..." and "while I
> don't feel completely super-great about it....", etc.
>
> I'd prefer to not rely on the taste argument if we can treat the benefits
> concretely.
>
>
>
>>
>> I thought the return types of the after picture conveyed more clearly the
>> idea of "I'm going to need a way to supply result objects, and way to
>> accumulate elements into result objects, and a way to combine result
>> objects."  And seeing those interface types as return types reinforced my
>> understanding of those types.
>>
>>
>> I assume that the trade-offs we're weighing here are purely to do with
>>> what it's like to be a Collector implementor, correct?
>>>
>>
>> Well, since I persist in preferring the after picture -- maybe the
>> impending blizzard has addled my senses -- I'd say the benefit to Collector
>> implementers is secondary.
>>
>> --tim
>>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>

From tim at peierls.net  Fri Feb  8 12:55:00 2013
From: tim at peierls.net (Tim Peierls)
Date: Fri, 8 Feb 2013 15:55:00 -0500
Subject: explode
In-Reply-To: <51151E05.50908@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CAGKkBkvTjtcrNvNXMSbYaCMDe6xYL8uCWRxEkjZEUXjTziqk_w@mail.gmail.com>
	<51151E05.50908@oracle.com>
Message-ID: <CA+F8eeQWMCXqNDhWtwE-H1JycmvS8A2QvsTqTXR9fCTQhFX8bA@mail.gmail.com>

Modulo the names, seems reasonable. Don't know why the extra static method,
but it doesn't wreck things for me.

On Fri, Feb 8, 2013 at 10:47 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> OK, just to put it all down on "paper" where flatMap landed...are we OK
> with this?
>
> java.util.stream.FlatMapper:
>
> public interface FlatMapper<T, U> {
>     void explodeInto(T element, Consumer<U> sink);
>
>     interface ToInt<T> {
>         void explodeInto(T element, IntConsumer sink);
>     }
>
>     interface ToLong<T> {
>         void explodeInto(T element, LongConsumer sink);
>     }
>
>     interface ToDouble<T> {
>         void explodeInto(T element, DoubleConsumer sink);
>     }
>
>     interface OfIntToInt {
>         void explodeInto(int element, IntConsumer sink);
>     }
>
>     interface OfLongToLong {
>         void explodeInto(long element, LongConsumer sink);
>     }
>
>     interface OfDoubleToDouble {
>         void explodeInto(double element, DoubleConsumer sink);
>     }
> }
>
> In Stream:
>
>     <R> Stream<R> flatMap(Function<T, Stream<? extends R>> mapper);
>
>     <R> Stream<R> flatMap(FlatMapper<? super T, R> mapper);
>
>     IntStream flatMap(FlatMapper.ToInt<? super T> mapper);
>     LongStream flatMap(FlatMapper.ToLong<? super T> mapper);
>     DoubleStream flatMap(FlatMapper.ToDouble<? super T> mapper);
>
> In IntStream (similar for {Double,Long}Stream):
>
>     IntStream flatMap(IntFunction<? extends IntStream> mapper);
>     IntStream flatMap(FlatMapper.OfIntToInt mapper);
>
>
> And Remi wants one more static helper method in FlatMap:
>
>
>  public static <T, U> FlatMapper<T, U>
>  explodeCollection(Function<? super T, ? extends Collection<? extends U>>
> function)
>
>
> I think this wraps up the explosive section of our program?
>
>
>
> On 2/6/2013 7:05 PM, Kevin Bourrillion wrote:
>
>> On Wed, Feb 6, 2013 at 3:30 PM, Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>        Stream<U> flatMap(FlatMapper<T, U>)
>>
>>        Stream<U> flatMap(Function<T, Stream<U>>)
>>
>>
>> To make sure I understand: would these two behave identically? Would
>> they imaginably perform comparably?
>>
>>      foos.stream().flatMap((t, consumer) ->
>> t.somethingThatGivesAStream().**forEach(consumer))
>>      foos.stream().flatMap(t -> t.somethingThatGivesAStream())
>>
>> Second question, why "FlatMapper.OfInt" here, but "IntSupplier" etc.
>> elsewhere?
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>
>>
>

From brian.goetz at oracle.com  Fri Feb  8 14:30:14 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 17:30:14 -0500
Subject: stream() / parallelStream() methods
Message-ID: <51157C76.8040303@oracle.com>

Currently, we define stream() and parallelStream() on Collection, with 
default of:

     default Stream<E> stream() {
         return Streams.stream(() -> Streams.spliterator(iterator(), 
size(), Spliterator.SIZED), Spliterator.SIZED);
     }

     default Stream<E> parallelStream() {
         return stream().parallel();
     }

So the default behavior is "get an Iterator, turn it into a Spliterator, 
and turn that into a Stream."  Then the specific Collection classes 
generally override it, providing better Spliterator implementations and 
more precise flag sets.


Several people have requested moving stream/parallelStream up to 
Iterable, on the theory that (a) the default implementations that would 
live there are not terrible (only difference between that and Collection 
default is Iterable doesn't know size()), (b) Collection could still 
override with the size-injecting version, and (c) a lot of APIs are 
designed to return Iterable as the "least common denominator" aggregate, 
and being able to stream them would be useful.  I don't see any problem 
with moving these methods up to Iterable.

Any objections?


From kevinb at google.com  Fri Feb  8 15:20:44 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 15:20:44 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51157C76.8040303@oracle.com>
References: <51157C76.8040303@oracle.com>
Message-ID: <CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>

Yeah, I think we have little choice but to do this. It makes sense, and
without it, Guava will just end up having to offer a static helper method
to return (iterable instanceof Collection) ? ((Collection)
iterable).stream() :
Streams.stream(Streams.spliteratorUnknownSize(iterable.iterator()), 0).
Blech.

(Tangentially, I would really love to drop parallelStream() and let people
call stream().parallel(). But I haven't managed to scour the archives to
find if that argument's already suitably played out.)


On Fri, Feb 8, 2013 at 2:30 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Currently, we define stream() and parallelStream() on Collection, with
> default of:
>
>     default Stream<E> stream() {
>         return Streams.stream(() -> Streams.spliterator(iterator()**,
> size(), Spliterator.SIZED), Spliterator.SIZED);
>     }
>
>     default Stream<E> parallelStream() {
>         return stream().parallel();
>     }
>
> So the default behavior is "get an Iterator, turn it into a Spliterator,
> and turn that into a Stream."  Then the specific Collection classes
> generally override it, providing better Spliterator implementations and
> more precise flag sets.
>
>
> Several people have requested moving stream/parallelStream up to Iterable,
> on the theory that (a) the default implementations that would live there
> are not terrible (only difference between that and Collection default is
> Iterable doesn't know size()), (b) Collection could still override with the
> size-injecting version, and (c) a lot of APIs are designed to return
> Iterable as the "least common denominator" aggregate, and being able to
> stream them would be useful.  I don't see any problem with moving these
> methods up to Iterable.
>
> Any objections?
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Fri Feb  8 15:24:02 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 18:24:02 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
Message-ID: <51158912.7030005@oracle.com>

> (Tangentially, I would really love to drop parallelStream() and let
> people call stream().parallel(). But I haven't managed to scour the
> archives to find if that argument's already suitably played out.)

Direct version is more performant, in that it requires less wrapping (to 
turn a stream into a parallel stream, you have to first create the 
sequential stream, then transfer ownership of its state into a new 
Stream.)

But, inconsistently, we have dropped a number of parallel stream 
factories along the same lines, because the 2x explosion of 
intGenerator/parallelIntGenerator was too much.  But considering this is 
just one new method in Iterable/Collection, and it does make a 
difference in a common case, the status quo does seem reasonable.


From kevinb at google.com  Fri Feb  8 15:23:10 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 15:23:10 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
Message-ID: <CAGKkBksgREOZ5TQxQv7hNLJxfyd4K+ReXN+vzCehmG+w_0ZXww@mail.gmail.com>

Can we make our best attempt to specify Iterable.stream() better than
Iterable.iterator() was?

I haven't worked out how to *say* this yet, but the idea is:

- If at all possible to ensure that each call to stream() returns an actual
working and *independent* stream, you really really should do that.
- If that's just not possible, the second call to stream() really really
should throw ISE.

(Yes, I do realize most Iterables by far will just inherit stream(), so it
will only be as repeat-usable as iterator() is.)


On Fri, Feb 8, 2013 at 3:20 PM, Kevin Bourrillion <kevinb at google.com> wrote:

> Yeah, I think we have little choice but to do this. It makes sense, and
> without it, Guava will just end up having to offer a static helper method
> to return (iterable instanceof Collection) ? ((Collection)
> iterable).stream() :
> Streams.stream(Streams.spliteratorUnknownSize(iterable.iterator()), 0).
> Blech.
>
> (Tangentially, I would really love to drop parallelStream() and let people
> call stream().parallel(). But I haven't managed to scour the archives to
> find if that argument's already suitably played out.)
>
>
>
> On Fri, Feb 8, 2013 at 2:30 PM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> Currently, we define stream() and parallelStream() on Collection, with
>> default of:
>>
>>     default Stream<E> stream() {
>>         return Streams.stream(() -> Streams.spliterator(iterator()**,
>> size(), Spliterator.SIZED), Spliterator.SIZED);
>>     }
>>
>>     default Stream<E> parallelStream() {
>>         return stream().parallel();
>>     }
>>
>> So the default behavior is "get an Iterator, turn it into a Spliterator,
>> and turn that into a Stream."  Then the specific Collection classes
>> generally override it, providing better Spliterator implementations and
>> more precise flag sets.
>>
>>
>> Several people have requested moving stream/parallelStream up to
>> Iterable, on the theory that (a) the default implementations that would
>> live there are not terrible (only difference between that and Collection
>> default is Iterable doesn't know size()), (b) Collection could still
>> override with the size-injecting version, and (c) a lot of APIs are
>> designed to return Iterable as the "least common denominator" aggregate,
>> and being able to stream them would be useful.  I don't see any problem
>> with moving these methods up to Iterable.
>>
>> Any objections?
>>
>>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Fri Feb  8 15:25:19 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 15:25:19 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51158912.7030005@oracle.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
Message-ID: <CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>

On Fri, Feb 8, 2013 at 3:24 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

 (Tangentially, I would really love to drop parallelStream() and let
>> people call stream().parallel(). But I haven't managed to scour the
>> archives to find if that argument's already suitably played out.)
>>
>
> Direct version is more performant, in that it requires less wrapping (to
> turn a stream into a parallel stream, you have to first create the
> sequential stream, then transfer ownership of its state into a new Stream.)
>

But really a lot of *work* has already happened by then?

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Fri Feb  8 15:28:22 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 18:28:22 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
Message-ID: <51158A16.4000703@oracle.com>

Depends how seriously you are counting.  Doug counts individual object 
creations and virtual invocations on the way to a parallel operation, 
because until you start forking, you're on the wrong side of Amdahl's 
law -- this is all "serial fraction" that happens before you can fork 
any work, which pushes your breakeven threshold further out.  So getting 
the setup path for parallel ops fast is valuable.

On 2/8/2013 6:25 PM, Kevin Bourrillion wrote:
> On Fri, Feb 8, 2013 at 3:24 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>         (Tangentially, I would really love to drop parallelStream() and let
>         people call stream().parallel(). But I haven't managed to scour the
>         archives to find if that argument's already suitably played out.)
>
>
>     Direct version is more performant, in that it requires less wrapping
>     (to turn a stream into a parallel stream, you have to first create
>     the sequential stream, then transfer ownership of its state into a
>     new Stream.)
>
>
> But really a lot of /work/ has already happened by then?
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>

From kevinb at google.com  Fri Feb  8 15:28:46 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 15:28:46 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51157C76.8040303@oracle.com>
References: <51157C76.8040303@oracle.com>
Message-ID: <CAGKkBkvbBsokdhTbYi1O7TGZXRXkv0-A0oTMhrBQu=9GNRzFiA@mail.gmail.com>

Here's the other issue this raises.

To my knowledge there's no Streamable<T> interface defined.  Maybe it
wasn't needed; I'm not sure.  But once Iterable looks like this, now
Iterable becomes the new Streamable.  If you support a stream(), you'll
implement Iterable to expose that fact.  This is a little bit weird.  I'm
undecided on how big a problem it would be, but overall, Streamable<T>
seems like a pretty normal thing to have.


On Fri, Feb 8, 2013 at 2:30 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Currently, we define stream() and parallelStream() on Collection, with
> default of:
>
>     default Stream<E> stream() {
>         return Streams.stream(() -> Streams.spliterator(iterator()**,
> size(), Spliterator.SIZED), Spliterator.SIZED);
>     }
>
>     default Stream<E> parallelStream() {
>         return stream().parallel();
>     }
>
> So the default behavior is "get an Iterator, turn it into a Spliterator,
> and turn that into a Stream."  Then the specific Collection classes
> generally override it, providing better Spliterator implementations and
> more precise flag sets.
>
>
> Several people have requested moving stream/parallelStream up to Iterable,
> on the theory that (a) the default implementations that would live there
> are not terrible (only difference between that and Collection default is
> Iterable doesn't know size()), (b) Collection could still override with the
> size-injecting version, and (c) a lot of APIs are designed to return
> Iterable as the "least common denominator" aggregate, and being able to
> stream them would be useful.  I don't see any problem with moving these
> methods up to Iterable.
>
> Any objections?
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Fri Feb  8 15:32:08 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 18:32:08 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBkvbBsokdhTbYi1O7TGZXRXkv0-A0oTMhrBQu=9GNRzFiA@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBkvbBsokdhTbYi1O7TGZXRXkv0-A0oTMhrBQu=9GNRzFiA@mail.gmail.com>
Message-ID: <51158AF8.5010509@oracle.com>

> Here's the other issue this raises.
>
> To my knowledge there's no Streamable<T> interface defined.

Right.  Earlier drafts had one (ask Doug to recount the "OMG so many 
interfaces" horror of Iteration 2), and since then we've been working 
really hard to eliminate each incremental public type, as each adds API 
surface area.  I think we've been really successful at this; I'd hate to 
slide backwards.

> Maybe it
> wasn't needed; I'm not sure.  But once Iterable looks like this, now
> Iterable becomes the new Streamable.  If you support a stream(), you'll
> implement Iterable to expose that fact.  This is a little bit weird.
>   I'm undecided on how big a problem it would be, but overall,
> Streamable<T> seems like a pretty normal thing to have.

Leading question: if everything that is Iterable is effectively 
Streamable (because Iterable has a stream()) method, and everything 
Streamable is effectively Iterable (because you can turn a Spliterator 
into an Iterator), aren't they then the same abstraction?


From kevinb at google.com  Fri Feb  8 15:35:08 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 15:35:08 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51158A16.4000703@oracle.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
Message-ID: <CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>

Doug, I am extraordinarily unmoved by this concern. :-)  Does a break-even
point moving a few elements in either direction really matter?


On Fri, Feb 8, 2013 at 3:28 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

Depends how seriously you are counting.  Doug counts individual object
> creations and virtual invocations on the way to a parallel operation,
> because until you start forking, you're on the wrong side of Amdahl's law
> -- this is all "serial fraction" that happens before you can fork any work,
> which pushes your breakeven threshold further out.  So getting the setup
> path for parallel ops fast is valuable.
>
>
> On 2/8/2013 6:25 PM, Kevin Bourrillion wrote:
>
>> On Fri, Feb 8, 2013 at 3:24 PM, Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>         (Tangentially, I would really love to drop parallelStream() and
>> let
>>         people call stream().parallel(). But I haven't managed to scour
>> the
>>         archives to find if that argument's already suitably played out.)
>>
>>
>>     Direct version is more performant, in that it requires less wrapping
>>     (to turn a stream into a parallel stream, you have to first create
>>     the sequential stream, then transfer ownership of its state into a
>>     new Stream.)
>>
>>
>> But really a lot of /work/ has already happened by then?
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>
>>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Fri Feb  8 15:39:06 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 15:39:06 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51158AF8.5010509@oracle.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBkvbBsokdhTbYi1O7TGZXRXkv0-A0oTMhrBQu=9GNRzFiA@mail.gmail.com>
	<51158AF8.5010509@oracle.com>
Message-ID: <CAGKkBktR4DrEJ+rwZdxQRPuZncchbWfkpbfWZL7Re6hXbTFK2A@mail.gmail.com>

On Fri, Feb 8, 2013 at 3:32 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

 Here's the other issue this raises.
>>
>> To my knowledge there's no Streamable<T> interface defined.
>>
>
> Right.  Earlier drafts had one (ask Doug to recount the "OMG so many
> interfaces" horror of Iteration 2), and since then we've been working
> really hard to eliminate each incremental public type, as each adds API
> surface area.  I think we've been really successful at this; I'd hate to
> slide backwards.
>
>
>  Maybe it
>> wasn't needed; I'm not sure.  But once Iterable looks like this, now
>> Iterable becomes the new Streamable.  If you support a stream(), you'll
>> implement Iterable to expose that fact.  This is a little bit weird.
>>   I'm undecided on how big a problem it would be, but overall,
>> Streamable<T> seems like a pretty normal thing to have.
>>
>
> Leading question: if everything that is Iterable is effectively Streamable
> (because Iterable has a stream()) method, and everything Streamable is
> effectively Iterable (because you can turn a Spliterator into an Iterator),
> aren't they then the same abstraction?
>

Yes: just making sure we really want that.

If I fail in my bid to kill parallelStream() then could we at least keep it
on Collection?  With Iterable already growing from 1 to 3 methods, that one
extra is pretty significant bulk.  (Still, let's kill it entirely :-))


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From joe.bowbeer at gmail.com  Fri Feb  8 15:41:32 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Fri, 8 Feb 2013 15:41:32 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBksgREOZ5TQxQv7hNLJxfyd4K+ReXN+vzCehmG+w_0ZXww@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<CAGKkBksgREOZ5TQxQv7hNLJxfyd4K+ReXN+vzCehmG+w_0ZXww@mail.gmail.com>
Message-ID: <CAHzJPErf8Mx3PAjw_fwFYY8ef5g3hcm-Ee2cb_HxfPLw=TAu=A@mail.gmail.com>

This concern over reuse rings a bell.  Are these the concerns that led us
*not* to burden Iterable with these methods?

We haven't talked about ISE in months, so we did something right :)


On Fri, Feb 8, 2013 at 3:23 PM, Kevin Bourrillion <kevinb at google.com> wrote:

> Can we make our best attempt to specify Iterable.stream() better than
> Iterable.iterator() was?
>
> I haven't worked out how to *say* this yet, but the idea is:
>
> - If at all possible to ensure that each call to stream() returns an
> actual working and *independent* stream, you really really should do that.
> - If that's just not possible, the second call to stream() really really
> should throw ISE.
>
> (Yes, I do realize most Iterables by far will just inherit stream(), so it
> will only be as repeat-usable as iterator() is.)
>
>
> On Fri, Feb 8, 2013 at 3:20 PM, Kevin Bourrillion <kevinb at google.com>wrote:
>
>> Yeah, I think we have little choice but to do this. It makes sense, and
>> without it, Guava will just end up having to offer a static helper method
>> to return (iterable instanceof Collection) ? ((Collection)
>> iterable).stream() :
>> Streams.stream(Streams.spliteratorUnknownSize(iterable.iterator()), 0).
>> Blech.
>>
>> (Tangentially, I would really love to drop parallelStream() and let
>> people call stream().parallel(). But I haven't managed to scour the
>> archives to find if that argument's already suitably played out.)
>>
>>
>>
>> On Fri, Feb 8, 2013 at 2:30 PM, Brian Goetz <brian.goetz at oracle.com>wrote:
>>
>>> Currently, we define stream() and parallelStream() on Collection, with
>>> default of:
>>>
>>>     default Stream<E> stream() {
>>>         return Streams.stream(() -> Streams.spliterator(iterator()**,
>>> size(), Spliterator.SIZED), Spliterator.SIZED);
>>>     }
>>>
>>>     default Stream<E> parallelStream() {
>>>         return stream().parallel();
>>>     }
>>>
>>> So the default behavior is "get an Iterator, turn it into a Spliterator,
>>> and turn that into a Stream."  Then the specific Collection classes
>>> generally override it, providing better Spliterator implementations and
>>> more precise flag sets.
>>>
>>>
>>> Several people have requested moving stream/parallelStream up to
>>> Iterable, on the theory that (a) the default implementations that would
>>> live there are not terrible (only difference between that and Collection
>>> default is Iterable doesn't know size()), (b) Collection could still
>>> override with the size-injecting version, and (c) a lot of APIs are
>>> designed to return Iterable as the "least common denominator" aggregate,
>>> and being able to stream them would be useful.  I don't see any problem
>>> with moving these methods up to Iterable.
>>>
>>> Any objections?
>>>
>>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>

From brian.goetz at oracle.com  Fri Feb  8 15:44:34 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 08 Feb 2013 18:44:34 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBktR4DrEJ+rwZdxQRPuZncchbWfkpbfWZL7Re6hXbTFK2A@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBkvbBsokdhTbYi1O7TGZXRXkv0-A0oTMhrBQu=9GNRzFiA@mail.gmail.com>
	<51158AF8.5010509@oracle.com>
	<CAGKkBktR4DrEJ+rwZdxQRPuZncchbWfkpbfWZL7Re6hXbTFK2A@mail.gmail.com>
Message-ID: <51158DE2.1060407@oracle.com>

> If I fail in my bid to kill parallelStream() then could we at least keep
> it on Collection?  With Iterable already growing from 1 to 3 methods,
> that one extra is pretty significant bulk.  (Still, let's kill it
> entirely :-))

I'm not sure I get this "bulking" argument.

The implementation on Iterable will be a default.

Let's say you're implementing an Iterable.  There are two ends of the 
spectrum:

1.  You are building a high-performance data structure.  You are 
definitely going to want to create your own spliterators and offer the 
best parallel performance.  So you are happy to see parallelStream().

2.  You are wrapping some other aggregates that you just want to be 
Iterable, so you cobble together an Iterator from whatever you've got. 
In which case you're likely to take the default stream/parallelStream 
implementations.  So you don't care that Iterable has parallelStream.

So at the ends, either you like it, or you're agnostic.  What's in the 
middle that's different?  I'm not seeing it.


From kevinb at google.com  Fri Feb  8 15:47:46 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 8 Feb 2013 15:47:46 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51158DE2.1060407@oracle.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBkvbBsokdhTbYi1O7TGZXRXkv0-A0oTMhrBQu=9GNRzFiA@mail.gmail.com>
	<51158AF8.5010509@oracle.com>
	<CAGKkBktR4DrEJ+rwZdxQRPuZncchbWfkpbfWZL7Re6hXbTFK2A@mail.gmail.com>
	<51158DE2.1060407@oracle.com>
Message-ID: <CAGKkBku3tz=HL2vT2iARA8iByTomxE-sG4rEWa6TxTsQhRF=DA@mail.gmail.com>

Sure, sure: it's much more about perception than specific impediment to
usage.


On Fri, Feb 8, 2013 at 3:44 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> If I fail in my bid to kill parallelStream() then could we at least keep
>> it on Collection?  With Iterable already growing from 1 to 3 methods,
>> that one extra is pretty significant bulk.  (Still, let's kill it
>> entirely :-))
>>
>
> I'm not sure I get this "bulking" argument.
>
> The implementation on Iterable will be a default.
>
> Let's say you're implementing an Iterable.  There are two ends of the
> spectrum:
>
> 1.  You are building a high-performance data structure.  You are
> definitely going to want to create your own spliterators and offer the best
> parallel performance.  So you are happy to see parallelStream().
>
> 2.  You are wrapping some other aggregates that you just want to be
> Iterable, so you cobble together an Iterator from whatever you've got. In
> which case you're likely to take the default stream/parallelStream
> implementations.  So you don't care that Iterable has parallelStream.
>
> So at the ends, either you like it, or you're agnostic.  What's in the
> middle that's different?  I'm not seeing it.
>
>
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From dl at cs.oswego.edu  Sat Feb  9 04:09:32 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sat, 09 Feb 2013 07:09:32 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
Message-ID: <51163C7C.4050509@cs.oswego.edu>

On 02/08/13 18:35, Kevin Bourrillion wrote:
> Doug, I am extraordinarily unmoved by this concern. :-)  Does a break-even point
> moving a few elements in either direction really matter?

People dealing with parallel library support need some attitude
adjustment about such things. On a soon-to-be-typical machine,
every cycle you waste setting up parallelism costs you say 64 cycles.
You would probably have had a different reaction if it required 64
object creations to start a parallel computation.

That said, I'm always completely supportive of forcing implementors
to work harder for the sake of better APIs, so long as the
APIs do not rule out efficient implementation. So if killing
parallelStream is really important, we'll find some way to
turn stream().parallel() into a bit-flip or somesuch.

-Doug


>
>
> On Fri, Feb 8, 2013 at 3:28 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     Depends how seriously you are counting.  Doug counts individual object
>     creations and virtual invocations on the way to a parallel operation,
>     because until you start forking, you're on the wrong side of Amdahl's law --
>     this is all "serial fraction" that happens before you can fork any work,
>     which pushes your breakeven threshold further out.  So getting the setup
>     path for parallel ops fast is valuable.
>
>
>     On 2/8/2013 6:25 PM, Kevin Bourrillion wrote:
>
>         On Fri, Feb 8, 2013 at 3:24 PM, Brian Goetz <brian.goetz at oracle.com
>         <mailto:brian.goetz at oracle.com>
>         <mailto:brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>__>> wrote:
>
>                  (Tangentially, I would really love to drop parallelStream() and let
>                  people call stream().parallel(). But I haven't managed to scour the
>                  archives to find if that argument's already suitably played out.)
>
>
>              Direct version is more performant, in that it requires less wrapping
>              (to turn a stream into a parallel stream, you have to first create
>              the sequential stream, then transfer ownership of its state into a
>              new Stream.)
>
>
>         But really a lot of /work/ has already happened by then?
>
>
>         --
>         Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>         <mailto:kevinb at google.com>
>         <mailto:kevinb at google.com <mailto:kevinb at google.com>>
>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>


From kevinb at google.com  Sat Feb  9 07:36:41 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Sat, 9 Feb 2013 07:36:41 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51163C7C.4050509@cs.oswego.edu>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
Message-ID: <CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>

On Sat, Feb 9, 2013 at 4:09 AM, Doug Lea <dl at cs.oswego.edu> wrote:

On 02/08/13 18:35, Kevin Bourrillion wrote:
>
>> Doug, I am extraordinarily unmoved by this concern. :-)  Does a
>> break-even point
>> moving a few elements in either direction really matter?
>>
>
> People dealing with parallel library support need some attitude
> adjustment about such things. On a soon-to-be-typical machine,
> every cycle you waste setting up parallelism costs you say 64 cycles.
> You would probably have had a different reaction if it required 64
> object creations to start a parallel computation.
>

Well, that would also have 64x the effect on young gen GC.

I still wouldn't immediately blanch at the 64 allocations. Do users really
want to use parallelism to get savings *that* small?  I thought we would
care more about the cases in which the parallelism is a huge win, not so
marginal.


That said, I'm always completely supportive of forcing implementors
> to work harder for the sake of better APIs, so long as the
> APIs do not rule out efficient implementation. So if killing
> parallelStream is really important, we'll find some way to
> turn stream().parallel() into a bit-flip or somesuch.
>

I will stop short of trying to convince us it's "important", but I would
definitely agree that if the cost is only some implementation ugliness,
that shouldn't be enough to justify the method existing.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From forax at univ-mlv.fr  Sat Feb  9 07:42:04 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 09 Feb 2013 16:42:04 +0100
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
Message-ID: <51166E4C.1070900@univ-mlv.fr>

On 02/09/2013 04:36 PM, Kevin Bourrillion wrote:
> On Sat, Feb 9, 2013 at 4:09 AM, Doug Lea <dl at cs.oswego.edu 
> <mailto:dl at cs.oswego.edu>> wrote:
>
>     On 02/08/13 18:35, Kevin Bourrillion wrote:
>
>         Doug, I am extraordinarily unmoved by this concern. :-)  Does
>         a break-even point
>         moving a few elements in either direction really matter?
>
>
>     People dealing with parallel library support need some attitude
>     adjustment about such things. On a soon-to-be-typical machine,
>     every cycle you waste setting up parallelism costs you say 64 cycles.
>     You would probably have had a different reaction if it required 64
>     object creations to start a parallel computation.
>
>
> Well, that would also have 64x the effect on young gen GC.
>
> I still wouldn't immediately blanch at the 64 allocations. Do users 
> really want to use parallelism to get savings /that/ small?  I thought 
> we would care more about the cases in which the parallelism is a huge 
> win, not so marginal.

It depends if the operation that you perform for each item take a long 
time or not.

>
>
>     That said, I'm always completely supportive of forcing implementors
>     to work harder for the sake of better APIs, so long as the
>     APIs do not rule out efficient implementation. So if killing
>     parallelStream is really important, we'll find some way to
>     turn stream().parallel() into a bit-flip or somesuch.
>
>
> I will stop short of trying to convince us it's "important", but I 
> would definitely agree that if the cost is only some implementation 
> ugliness, that shouldn't be enough to justify the method existing.


R?mi


From forax at univ-mlv.fr  Sat Feb  9 07:44:34 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 09 Feb 2013 16:44:34 +0100
Subject: explode
In-Reply-To: <51151E05.50908@oracle.com>
References: <5101706E.3030601@oracle.com> <51101BF7.9070409@oracle.com>
	<5112D9FC.9010707@oracle.com> <5112DE30.7030704@univ-mlv.fr>
	<5112E787.5090809@oracle.com>
	<CAGKkBkvTjtcrNvNXMSbYaCMDe6xYL8uCWRxEkjZEUXjTziqk_w@mail.gmail.com>
	<51151E05.50908@oracle.com>
Message-ID: <51166EE2.4090408@univ-mlv.fr>

On 02/08/2013 04:47 PM, Brian Goetz wrote:
> OK, just to put it all down on "paper" where flatMap landed...are we 
> OK with this?
>
> java.util.stream.FlatMapper:
>
> public interface FlatMapper<T, U> {
>     void explodeInto(T element, Consumer<U> sink);
>
>     interface ToInt<T> {
>         void explodeInto(T element, IntConsumer sink);
>     }
>
>     interface ToLong<T> {
>         void explodeInto(T element, LongConsumer sink);
>     }
>
>     interface ToDouble<T> {
>         void explodeInto(T element, DoubleConsumer sink);
>     }
>
>     interface OfIntToInt {
>         void explodeInto(int element, IntConsumer sink);
>     }
>
>     interface OfLongToLong {
>         void explodeInto(long element, LongConsumer sink);
>     }
>
>     interface OfDoubleToDouble {
>         void explodeInto(double element, DoubleConsumer sink);
>     }
> }
>
> In Stream:
>
>     <R> Stream<R> flatMap(Function<T, Stream<? extends R>> mapper);

just a wildcard issue:
<R> Stream<R> flatMap(Function<? super T, ? extends Stream<? extends R>> 
mapper);

R?mi


From dl at cs.oswego.edu  Sat Feb  9 07:47:28 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sat, 09 Feb 2013 10:47:28 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
Message-ID: <51166F90.301@cs.oswego.edu>

On 02/09/13 10:36, Kevin Bourrillion wrote:

> I still wouldn't immediately blanch at the 64 allocations. Do users really want
> to use parallelism to get savings /that/ small?  I thought we would care more
> about the cases in which the parallelism is a huge win, not so marginal.

If you take the "what's one more cycle" point of view consistently, then
it would never be worth trying to parallelize anything. So minimizing
seq overhead while keeping nice APIs is the *only* success criterion.


>
> I will stop short of trying to convince us it's "important", but I would
> definitely agree that if the cost is only some implementation ugliness, that
> shouldn't be enough to justify the method existing.Here's

Here's another breach in my promise not to have an opinion
about anything in the Steam API: I think "parallelStream()"
is much nicer than "stream().parallel()".

-Doug


From forax at univ-mlv.fr  Sat Feb  9 07:57:11 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 09 Feb 2013 16:57:11 +0100
Subject: Refactor of Collector interface
In-Reply-To: <51152975.1040305@oracle.com>
References: <511518D1.5050706@oracle.com>
	<CAGKkBks-F5CPXfE9pONtvRGCsU=CPL_3hB3rh+19s739nepeZQ@mail.gmail.com>
	<51152975.1040305@oracle.com>
Message-ID: <511671D7.6070607@univ-mlv.fr>

On 02/08/2013 05:36 PM, Brian Goetz wrote:
> Your subjective sense is accurate, which is why I brought this up. 
> This may be an example where is better to depart from the traditional 
> approach.
>
> To your question, it depends what you mean by "purely to do with an 
> implementor."  Collector *users* are going to be burdened with the 
> performance consequences of multiple layers of wrapping/conversion.
>
> The implementation used to be full of alternation between:
>
> interface Foo<T,U> {
>     U transform(T t);
> }
>
> class FooAdapter<T,U> {
>     FooAdapter(Function<T,U> lambda) { ... }
>
>     U transform(T t) { return lambda.apply(t); }
> }
>
> and
>
> Function<T,U> parentTransformer = foo::transform;
>
> and back again, introducing layers of wrapping even when the function 
> is not changing across layers.

Yes, the other problem is if we have something which is recursive,
we could easily end-up with a chain of adapters as long as the number of 
recursive calls.

This problem frequently arrives in dynamic language runtime when by 
example you convert from j.l.String to GroovyString and do the opposite 
operation. The only sane way to implement that is to provide a way to 
box and unbox things.
So having Collector being a triple seams to be the only sane choice.

R?mi

>
>
>
> On 2/8/2013 11:22 AM, Kevin Bourrillion wrote:
>> My subjective sense of good Java API design very strongly prefers the
>> "before" picture here, which I see as a lot more "Java-like", so I'm
>> taking a closer look.
>>
>> I assume that the trade-offs we're weighing here are purely to do with
>> what it's like to be a Collector implementor, correct?
>>
>>
>> On Fri, Feb 8, 2013 at 7:25 AM, Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com>> wrote:
>>
>>     FYI: In a recent refactoring, I changed:
>>
>>     public interface Collector<T, R> {
>>          R makeResult();
>>          void accumulate(R result, T value);
>>          R combine(R result, R other);
>>     }
>>
>>     to
>>
>>     public interface Collector<T, R> {
>>          Supplier<R> resultSupplier();
>>          BiConsumer<R, T> accumulator();
>>          BinaryOperator<R> combiner();
>>     }
>>
>>     Basically, this is a refactoring from typical interface to
>>     tuple-of-lambdas.  What I found was that there was a lot of
>>     adaptation going on, where something would start out as a lambda,
>>     we'd wrap it with a Collector whose method invoked the lambda, then
>>     take a method reference to that wrapping method and then later wrap
>>     that with another Collector, etc.  By keeping access to the
>>     functions directly, the Collectors code got simpler and less wrappy,
>>     since a lot of functions could just be passed right through without
>>     wrapping.  And a lot of stupid adapter classes went away.
>>
>>     While clearly we don't want all interfaces to evolve this way, this
>>     is one where *all* the many layers of manipulations are effectively
>>     function composition, and exposing the function-ness made that
>>     cleaner and more performant.  So while I don't feel completely
>>     super-great about it, I think its enough of a win to keep.
>>
>>
>>
>>
>> -- 
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>


From kevinb at google.com  Sat Feb  9 08:07:56 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Sat, 9 Feb 2013 08:07:56 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51166F90.301@cs.oswego.edu>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
	<51166F90.301@cs.oswego.edu>
Message-ID: <CAGKkBkuayDcuxoaXXE8t9x0kSP5set55P_VrtA6=GxBNaPhDvQ@mail.gmail.com>

Belated disclaimer: one should always read my comments on performance as
"please educate me because I don't get it", not "you're all wrong". :-)


On Sat, Feb 9, 2013 at 7:47 AM, Doug Lea <dl at cs.oswego.edu> wrote:

On 02/09/13 10:36, Kevin Bourrillion wrote:
>
>  I still wouldn't immediately blanch at the 64 allocations. Do users
>> really want
>> to use parallelism to get savings /that/ small?  I thought we would care
>> more
>>
>> about the cases in which the parallelism is a huge win, not so marginal.
>>
>
> If you take the "what's one more cycle" point of view consistently, then
> it would never be worth trying to parallelize anything. So minimizing
> seq overhead while keeping nice APIs is the *only* success criterion.


> I will stop short of trying to convince us it's "important", but I would
>> definitely agree that if the cost is only some implementation ugliness,
>> that
>> shouldn't be enough to justify the method existing.Here's
>>
>
> Here's another breach in my promise not to have an opinion
> about anything in the Steam API: I think "parallelStream()"
> is much nicer than "stream().parallel()".


But the choice isn't precisely between those two; it's between having one
or both.  I assume that the stream().parallel() option has to exist
regardless, and so users will encounter it in code, and they *will* have to
start discussions with each other about "why did you do s().p() instead of
.pS(), or vice versa, and what's the difference anyway?" Then, every time
someone *adds* a stream() method to their type they then face the question
of whether they're supposed to add parallelStream() too, etc.

I don't think absolute normalization is an API design goal in itself, but
having two very similar ways to do the same thing is a definite smell.

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From dl at cs.oswego.edu  Sat Feb  9 08:31:17 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sat, 09 Feb 2013 11:31:17 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAGKkBkuayDcuxoaXXE8t9x0kSP5set55P_VrtA6=GxBNaPhDvQ@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
	<51166F90.301@cs.oswego.edu>
	<CAGKkBkuayDcuxoaXXE8t9x0kSP5set55P_VrtA6=GxBNaPhDvQ@mail.gmail.com>
Message-ID: <511679D5.2090508@cs.oswego.edu>

On 02/09/13 11:07, Kevin Bourrillion wrote:

> But the choice isn't precisely between those two; it's between having one or
> both.  I assume that the stream().parallel() option has to exist regardless, and
> so users will encounter it in code, and they /will/ have to start discussions
> with each other about "why did you do s().p() instead of .pS(), or vice versa,
> and what's the difference anyway?" Then, every time someone /adds/ a stream()
> method to their type they then face the question of whether they're supposed to
> add parallelStream() too, etc.
>

Well, I don't like the parallel() method on Stream anyway, so I'll
let others take over from here...

-Doug


From brian.goetz at oracle.com  Sat Feb  9 08:31:52 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 09 Feb 2013 11:31:52 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <51166F90.301@cs.oswego.edu>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
	<51166F90.301@cs.oswego.edu>
Message-ID: <511679F8.1040407@oracle.com>

> Here's another breach in my promise not to have an opinion
> about anything in the Steam API: I think "parallelStream()"
> is much nicer than "stream().parallel()".

I do too, but I also recognize that is mostly just taste and we could 
get used to either.  But, let's turn the question around, because we 
have an inconsistent API right now with respect to stream constructors, 
and we should decide whether we want to choose that deliberately (which 
I think is fine), or go one way or the other.

We have a number of factories in Streams like:

   Streams.intRange(from, to)
   Streams.generate(T, UnaryOperator<T>)

We do *not* have explicit parallel versions of each of these; we did 
originally, and to prune down the API surface area, we cut them on the 
theory that dropping 20+ methods from the API was worth the tradeoff of 
the surface yuckiness and performance cost of .intRange(...).parallel(). 
  But we did not make that choice with Collection.

We could either remove the Collection.parallelStream(), or we could add 
the parallel versions of all the generators, or we could do nothing and 
leave it as is.  I think all are justifiable on API design grounds.

I kind of like the status quo, despite its inconsistency.  Instead of 
having 2N stream construction methods, we have N+1 -- but that extra 1 
covers a huge number of cases, because it is inherited by every 
Collection.  So I can justify to myself why having that extra 1 method 
is worth it, and why accepting the inconsistency of going no further is 
acceptable.

Do others disagree?  Is N+1 the practical choice here?  Or should we go 
for the purity of N?  Or the convenience and consistency of 2N?  Or is 
there some even better N+3, for some other specially chosen cases we 
want to give special support to?


From brian.goetz at oracle.com  Sat Feb  9 08:41:35 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 09 Feb 2013 11:41:35 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <511679D5.2090508@cs.oswego.edu>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
	<51166F90.301@cs.oswego.edu>
	<CAGKkBkuayDcuxoaXXE8t9x0kSP5set55P_VrtA6=GxBNaPhDvQ@mail.gmail.com>
	<511679D5.2090508@cs.oswego.edu>
Message-ID: <51167C3F.1000704@oracle.com>

> Well, I don't like the parallel() method on Stream anyway, so I'll
> let others take over from here...

You can't drop a bomb like that and walk away!  You have to explain why 
you don't like it, because I suspect most people's first guess about why 
will be wrong.

I'll take my best stab at explaining why: because it (like the stateful 
methods (sort, distinct, limit)) which you also don't like, move us 
incrementally farther from being able to express stream pipelines in 
terms of traditional data-parallel constructs, which further constrains 
our ability to to map them directly to tomorrow's computing substrate, 
whether that be vector processors, FPGAs, GPUs, or whatever we cook up.

Filter-map-reduce map very cleanly to all sorts of parallel computing 
substrates; 
filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce 
does not.

So the whole API design here embodies many tensions between making it 
easy to express things the user is likely to want to express, and doing 
is in a manner that we can predictably make fast with transparent cost 
models.


From dl at cs.oswego.edu  Sat Feb  9 08:49:17 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sat, 09 Feb 2013 11:49:17 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <51167C3F.1000704@oracle.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
	<51166F90.301@cs.oswego.edu>
	<CAGKkBkuayDcuxoaXXE8t9x0kSP5set55P_VrtA6=GxBNaPhDvQ@mail.gmail.com>
	<511679D5.2090508@cs.oswego.edu> <51167C3F.1000704@oracle.com>
Message-ID: <51167E0D.8060202@cs.oswego.edu>

On 02/09/13 11:41, Brian Goetz wrote:

> You can't drop a bomb like that and walk away!  You have to explain why you
> don't like it, because I suspect most people's first guess about why will be wrong.
>
> I'll take my best stab at explaining why:

Yes, thanks. Stateful Stream methods are clearly problematic.
Most people like them anyway because they are convenient.
And in any case, whenever they show up, many API discussions follow.


>  because it (like the stateful methods
> (sort, distinct, limit)) which you also don't like, move us incrementally
> farther from being able to express stream pipelines in terms of traditional
> data-parallel constructs, which further constrains our ability to to map them
> directly to tomorrow's computing substrate, whether that be vector processors,
> FPGAs, GPUs, or whatever we cook up.
>
> Filter-map-reduce map very cleanly to all sorts of parallel computing
> substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce
> does not.
>
> So the whole API design here embodies many tensions between making it easy to
> express things the user is likely to want to express, and doing is in a manner
> that we can predictably make fast with transparent cost models.
>


From joe.bowbeer at gmail.com  Sat Feb  9 10:13:59 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 9 Feb 2013 10:13:59 -0800
Subject: stream() / parallelStream() methods
In-Reply-To: <51167E0D.8060202@cs.oswego.edu>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
	<51166F90.301@cs.oswego.edu>
	<CAGKkBkuayDcuxoaXXE8t9x0kSP5set55P_VrtA6=GxBNaPhDvQ@mail.gmail.com>
	<511679D5.2090508@cs.oswego.edu> <51167C3F.1000704@oracle.com>
	<51167E0D.8060202@cs.oswego.edu>
Message-ID: <CAHzJPEp0knqFJXkWpLWgybi8j1G40hYdrgzejXgh-rztWe7JZQ@mail.gmail.com>

I'm OK with parallelStream(). It did raise a question when I used it for
the first time, but it was also easy to find in the IDE. I wanted
"parallel" and knew what I was getting into; as opposed to someone splicing
a parallel() into their expression as an afterthought..

The separation of parallel() from stream() also presents more possibilities
for the user, and therefore also raises questions. Where in the expression
does parallel() belong? In the parallel string-compare example, I had a
choice between boxed().parallel() or parallel().boxed(). Which is "right"?
Or maybe I should insert parallel() even later in the expression?
 On 02/09/13 11:41, Brian Goetz wrote:

 You can't drop a bomb like that and walk away!  You have to explain why you
> don't like it, because I suspect most people's first guess about why will
> be wrong.
>
> I'll take my best stab at explaining why:
>

Yes, thanks. Stateful Stream methods are clearly problematic.
Most people like them anyway because they are convenient.
And in any case, whenever they show up, many API discussions follow.


  because it (like the stateful methods
> (sort, distinct, limit)) which you also don't like, move us incrementally
> farther from being able to express stream pipelines in terms of traditional
> data-parallel constructs, which further constrains our ability to to map
> them
> directly to tomorrow's computing substrate, whether that be vector
> processors,
> FPGAs, GPUs, or whatever we cook up.
>
> Filter-map-reduce map very cleanly to all sorts of parallel computing
> substrates; filter-parallel-map-**sequential-sorted-limit-**
> parallel-map-uniq-reduce
> does not.
>
> So the whole API design here embodies many tensions between making it easy
> to
> express things the user is likely to want to express, and doing is in a
> manner
> that we can predictably make fast with transparent cost models.
>
>

From tim at peierls.net  Sat Feb  9 10:22:42 2013
From: tim at peierls.net (Tim Peierls)
Date: Sat, 9 Feb 2013 13:22:42 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAHzJPEp0knqFJXkWpLWgybi8j1G40hYdrgzejXgh-rztWe7JZQ@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
	<51166F90.301@cs.oswego.edu>
	<CAGKkBkuayDcuxoaXXE8t9x0kSP5set55P_VrtA6=GxBNaPhDvQ@mail.gmail.com>
	<511679D5.2090508@cs.oswego.edu> <51167C3F.1000704@oracle.com>
	<51167E0D.8060202@cs.oswego.edu>
	<CAHzJPEp0knqFJXkWpLWgybi8j1G40hYdrgzejXgh-rztWe7JZQ@mail.gmail.com>
Message-ID: <CA+F8eeTi0G_7v1QnF6wmf4RMipJCrroQ6VXTa3duKT4Dhg+i6g@mail.gmail.com>

On Sat, Feb 9, 2013 at 1:13 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> The separation of parallel() from stream() also presents more
> possibilities for the user, and therefore also raises questions. Where in
> the expression does parallel() belong? In the parallel string-compare
> example, I had a choice between boxed().parallel() or parallel().boxed().
> Which is "right"? Or maybe I should insert parallel() even later in the
> expression?
>

Yup, that's the sort of uncertainty that really slows me down. All those
choices to make, especially when the type system doesn't yield clues. I'd
rather give up on clean ways to express certain intricate (and uncommon?)
combinations than have to make choices like these.

--tim

From brian.goetz at oracle.com  Sat Feb  9 10:48:49 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 09 Feb 2013 13:48:49 -0500
Subject: stream() / parallelStream() methods
In-Reply-To: <CAHzJPEp0knqFJXkWpLWgybi8j1G40hYdrgzejXgh-rztWe7JZQ@mail.gmail.com>
References: <51157C76.8040303@oracle.com>
	<CAGKkBks9rhn9ztNZg_GwxRROXOdLY-dS=y5E9hSX7e_bzGaudg@mail.gmail.com>
	<51158912.7030005@oracle.com>
	<CAGKkBkvEpqhUPtrJMhj=mMnMTH2U9MU144EGHEA7qAdGLDSTqw@mail.gmail.com>
	<51158A16.4000703@oracle.com>
	<CAGKkBksc-SH=ipcS9Jsr=n=r4F-hhjiLFMYU8kzSnMe4xTjYBA@mail.gmail.com>
	<51163C7C.4050509@cs.oswego.edu>
	<CAGKkBkscyqk0yAYK_8ETt=gCdoQ8goFxsAbadWzPZ45xL7WV+Q@mail.gmail.com>
	<51166F90.301@cs.oswego.edu>
	<CAGKkBkuayDcuxoaXXE8t9x0kSP5set55P_VrtA6=GxBNaPhDvQ@mail.gmail.com>
	<511679D5.2090508@cs.oswego.edu> <51167C3F.1000704@oracle.com>
	<51167E0D.8060202@cs.oswego.edu>
	<CAHzJPEp0knqFJXkWpLWgybi8j1G40hYdrgzejXgh-rztWe7JZQ@mail.gmail.com>
Message-ID: <51169A11.1050903@oracle.com>

> The separation of parallel() from stream() also presents more
> possibilities for the user, and therefore also raises questions. Where
> in the expression does parallel() belong? In the parallel string-compare
> example, I had a choice between boxed().parallel() or
> parallel().boxed(). Which is "right"? Or maybe I should insert
> parallel() even later in the expression?

Good question.  Clearly more education will be needed here.

There's two axes on which to evaluate how to use .parallel() and 
.sequential(); semantic and performance.

The semantics are straightforward.  If a stream starts out sequential, then:

   foo.filter(...).parallel().map(...)

will do the filtering sequentially and the mapping in parallel.  Whereas

   foo.parallel().filter(...).map(...)

will do both in parallel.  I think users can understand that aspect of 
it; it seems pretty straightforward.

If the stream is already s/p then s()/p() are no-ops (well, a single 
virtual call and a field read.)

On the performance front, that's always a moving target, but currently 
.parallel() on a "naked" (no ops added yet, as in the second case) 
stream is much cheaper than .parallel() on a stream that already has ops 
(like in the first case.)


From sam at sampullara.com  Sat Feb  9 11:26:25 2013
From: sam at sampullara.com (Sam Pullara)
Date: Sat, 9 Feb 2013 11:26:25 -0800
Subject: Internal and External truncation conditions
Message-ID: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>

Now that we are further along, I wanted to bring this up again. I
don't think that forEachUntil is sufficient for handling internal and
external conditions that should truncate stream processing. I've also
looked at CloseableStream and that doesn't appear to help since it
isn't possible to wrap a Stream (say an infinite stream) with a
CloseableStream and get the necessary semantics of cancellation. Also,
other APIs that don't consider that you might give them a
CloseableStream will likely still give you back a Stream thus losing
the semantics.

Is everyone else happy with forEachUntil and CloseableStream?

Sam

---------- Forwarded message ----------
From: Sam Pullara <sam at sampullara.com>
Date: Mon, Dec 31, 2012 at 8:34 AM
Subject: Re: Cancelation -- use cases
To: Brian Goetz <brian.goetz at oracle.com>
Cc: "lambda-libs-spec-experts at openjdk.java.net"
<lambda-libs-spec-experts at openjdk.java.net>

I think we are conflating two things with this solution and it doesn't
work for them in my mind. Here is what I would like the solution to
cover:

- External conditions (cancellation, cleanup)
- Internal conditions (gating based on count, elements and results)

The first one may be the only one that works in the parallel case. It
should likely be implemented with .close() on stream that would stop
the stream as soon as possible. This would be useful for things like
timeouts. Kind of like calling close on an inputstream in the middle
of reading it. The other one I think is necessary and hard to
implement correctly with the parallel case. For instance I would like
to say:

stream.gate(e -> e < 10).forEach(e -> ?)

OR

stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the
number of the current element

That should give me every element in the stream until an element isn't
< 10 and then stop processing elements. Further, there should be some
way for the stream source to be notified that we are done consuming it
in case it is of unknown length or consumes resources. That would be
more like (assuming we add a Runnable call to Timer):

Stream stream = ?.
new Timer().schedule(() -> stream.close(), 5000);
stream.forEach(e -> ?.);

OR

stream.forEach(e -> try { ? } catch() { stream.close() } );

Sadly, the first gate() case doesn't work well when parallelized. I'm
willing to just specify what the behavior is for that case to get it
into the API. For example, I would probably say something like "the
gate will need to return false once per split to stop processing". In
either of these cases I think one of the motivations needs to be that
the stream may be using resources and we need to tell the source that
we are done consuming it. For example, if the stream is sourced from a
file, database or even a large amount of memory there should be a
notification mechanism for doneness that will allow those resources to
be returned before the stream is exhausted. To that end I think that
Stream should implement AutoCloseable but overridden with no checked
exception.

interface Stream<T> implements AutoCloseable {
  /**
   * Closes this stream and releases any system resources associated
   * with it. If the stream is already closed then invoking this
   * method has no effect. Close is automatically called when the
   * stream is exhausted. After this is called, no further elements
   * will be processed by the stream but currently processing elements
   * will complete normally. Calling other methods on a closed stream will
   * produce IllegalStateExceptions.
   */
  void close();

  /**
   * When the continueProcessing function returns false, no further
   * elements will be processed after the gate. In the parallel stream
   * case no further elements will be processed in the current split.
   */
  Stream<T> gate(Function<T, Boolean> until);

  /**
   * As gate with the addition of the current element number.
   */
  Stream<T> gate(BiFunction<T, Integer, Boolean> until);
}

This API avoids a lot of side effects that forEachUntil would require
implement these use cases.

Sam

On Dec 30, 2012, at 7:53 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

Here's a lower-complexity version of cancel, that still satisfies (in
series or in parallel) use cases like the following:

>   - Find the best possible move after thinking for 5 seconds
>   - Find the first solution that is better than X
>   - Gather solutions until we have 100 of them

without bringing in the complexity or time/space overhead of dealing
with encounter order.

Since the forEach() operation works exclusively on the basis of
temporal/arrival order rather than spatial/encounter order (elements
are passed to the lambda in whatever order they are available, in
whatever thread they are available), we could make a canceling variant
of forEach:

 .forEachUntil(Block<T> sink, BooleanSupplier until)

Here, there is no confusion about what happens in the ordered case, no
need to buffer elements, etc.  Elements flow into the block until the
termination condition transpires, at which point there are no more
splits and existing splits dispense no more elements.

I implemented this (it was trivial) and wrote a simple test program to
calculate primes sequentially and in parallel, counting how many could
be calculated in a fixed amount of time, starting from an infinite
generator and filtering out composites:

           Streams.iterate(from, i -> i + 1)  // sequential
                   .filter(i -> isPrime(i))
                   .forEachUntil(i -> {
                       chm.put(i, true);
                   }, () -> System.currentTimeMillis() >= start+num);

vs

           Streams.iterate(from, i -> i+1)    // parallel
                   .parallel()
                   .filter(i -> isPrime(i))
                   .forEachUntil(i -> {
                       chm.put(i, true);
                   }, () -> System.currentTimeMillis() >= start+num);

On a 4-core Q6600 system, in a fixed amount of time, the parallel
version gathered ~3x as many primes.

In terms of being able to perform useful computations on infinite
streams, this seems a pretty attractive price-performer; lower spec
and implementation complexity, and covers many of the use cases which
would otherwise be impractical to attack with the stream approach.


On 12/28/2012 11:20 AM, Brian Goetz wrote:

I've been working through some alternatives for cancellation support in
infinite streams.  Looking to gather some use case background to help
evaluate the alternatives.

In the serial case, the "gate" approach works fine -- after some
criteria transpires, stop sending elements downstream.  The pipeline
flushes the elements it has, and completes early.

In the parallel unordered case, the gate approach similarly works fine
-- after the cancelation criteria occurs, no new splits are created, and
existing splits dispense no more elements.  The computation similarly
quiesces after elements currently being processed are completed,
possibly along with any up-tree merging to combine results.

It is the parallel ordered case that is tricky.  Supposing we partition
a stream into
  (a1,a2,a3), (a4,a5,a6)

And suppose further we happen to be processing a5 when the bell goes
off.  Do we want to wait for all a_i, i<5, to finish before letting the
computation quiesce?

My gut says: for the things we intend to cancel, most of them will be
order-insensitive anyway.  Things like:

 - Find the best possible move after thinking for 5 seconds
 - Find the first solution that is better than X
 - Gather solutions until we have 100 of them

I believe the key use case for cancelation here will be when we are
chewing on potentially infinite streams of events (probably backed by
IO) where we want to chew until we're asked to shut down, and want to
get as much parallelism as we can cheaply.  Which suggests to me the
intersection between order-sensitive stream pipelines and cancelable
stream pipelines is going to be pretty small indeed.

Anyone want to add to this model of use cases for cancelation?

From joe.bowbeer at gmail.com  Sat Feb  9 11:36:59 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 9 Feb 2013 11:36:59 -0800
Subject: Internal and External truncation conditions
In-Reply-To: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
Message-ID: <CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>

I haven't used either of these.  If I wanted to create an example I'd
probably start with a stream of lines() from a BufferedReader, and then
tack on a use case.

try (BufferedReader r = Files.newBufferedReader(path,
Charset.defaultCharset())) {
  r.lines().forEachUntil(...);}


Do you have something specific in mind?


On Sat, Feb 9, 2013 at 11:26 AM, Sam Pullara <sam at sampullara.com> wrote:

> Now that we are further along, I wanted to bring this up again. I
> don't think that forEachUntil is sufficient for handling internal and
> external conditions that should truncate stream processing. I've also
> looked at CloseableStream and that doesn't appear to help since it
> isn't possible to wrap a Stream (say an infinite stream) with a
> CloseableStream and get the necessary semantics of cancellation. Also,
> other APIs that don't consider that you might give them a
> CloseableStream will likely still give you back a Stream thus losing
> the semantics.
>
> Is everyone else happy with forEachUntil and CloseableStream?
>
> Sam
>
> ---------- Forwarded message ----------
> From: Sam Pullara <sam at sampullara.com>
> Date: Mon, Dec 31, 2012 at 8:34 AM
> Subject: Re: Cancelation -- use cases
> To: Brian Goetz <brian.goetz at oracle.com>
> Cc: "lambda-libs-spec-experts at openjdk.java.net"
> <lambda-libs-spec-experts at openjdk.java.net>
>
> I think we are conflating two things with this solution and it doesn't
> work for them in my mind. Here is what I would like the solution to
> cover:
>
> - External conditions (cancellation, cleanup)
> - Internal conditions (gating based on count, elements and results)
>
> The first one may be the only one that works in the parallel case. It
> should likely be implemented with .close() on stream that would stop
> the stream as soon as possible. This would be useful for things like
> timeouts. Kind of like calling close on an inputstream in the middle
> of reading it. The other one I think is necessary and hard to
> implement correctly with the parallel case. For instance I would like
> to say:
>
> stream.gate(e -> e < 10).forEach(e -> ?)
>
> OR
>
> stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the
> number of the current element
>
> That should give me every element in the stream until an element isn't
> < 10 and then stop processing elements. Further, there should be some
> way for the stream source to be notified that we are done consuming it
> in case it is of unknown length or consumes resources. That would be
> more like (assuming we add a Runnable call to Timer):
>
> Stream stream = ?.
> new Timer().schedule(() -> stream.close(), 5000);
> stream.forEach(e -> ?.);
>
> OR
>
> stream.forEach(e -> try { ? } catch() { stream.close() } );
>
> Sadly, the first gate() case doesn't work well when parallelized. I'm
> willing to just specify what the behavior is for that case to get it
> into the API. For example, I would probably say something like "the
> gate will need to return false once per split to stop processing". In
> either of these cases I think one of the motivations needs to be that
> the stream may be using resources and we need to tell the source that
> we are done consuming it. For example, if the stream is sourced from a
> file, database or even a large amount of memory there should be a
> notification mechanism for doneness that will allow those resources to
> be returned before the stream is exhausted. To that end I think that
> Stream should implement AutoCloseable but overridden with no checked
> exception.
>
> interface Stream<T> implements AutoCloseable {
>   /**
>    * Closes this stream and releases any system resources associated
>    * with it. If the stream is already closed then invoking this
>    * method has no effect. Close is automatically called when the
>    * stream is exhausted. After this is called, no further elements
>    * will be processed by the stream but currently processing elements
>    * will complete normally. Calling other methods on a closed stream will
>    * produce IllegalStateExceptions.
>    */
>   void close();
>
>   /**
>    * When the continueProcessing function returns false, no further
>    * elements will be processed after the gate. In the parallel stream
>    * case no further elements will be processed in the current split.
>    */
>   Stream<T> gate(Function<T, Boolean> until);
>
>   /**
>    * As gate with the addition of the current element number.
>    */
>   Stream<T> gate(BiFunction<T, Integer, Boolean> until);
> }
>
> This API avoids a lot of side effects that forEachUntil would require
> implement these use cases.
>
> Sam
>
> On Dec 30, 2012, at 7:53 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
> Here's a lower-complexity version of cancel, that still satisfies (in
> series or in parallel) use cases like the following:
>
> >   - Find the best possible move after thinking for 5 seconds
> >   - Find the first solution that is better than X
> >   - Gather solutions until we have 100 of them
>
> without bringing in the complexity or time/space overhead of dealing
> with encounter order.
>
> Since the forEach() operation works exclusively on the basis of
> temporal/arrival order rather than spatial/encounter order (elements
> are passed to the lambda in whatever order they are available, in
> whatever thread they are available), we could make a canceling variant
> of forEach:
>
>  .forEachUntil(Block<T> sink, BooleanSupplier until)
>
> Here, there is no confusion about what happens in the ordered case, no
> need to buffer elements, etc.  Elements flow into the block until the
> termination condition transpires, at which point there are no more
> splits and existing splits dispense no more elements.
>
> I implemented this (it was trivial) and wrote a simple test program to
> calculate primes sequentially and in parallel, counting how many could
> be calculated in a fixed amount of time, starting from an infinite
> generator and filtering out composites:
>
>            Streams.iterate(from, i -> i + 1)  // sequential
>                    .filter(i -> isPrime(i))
>                    .forEachUntil(i -> {
>                        chm.put(i, true);
>                    }, () -> System.currentTimeMillis() >= start+num);
>
> vs
>
>            Streams.iterate(from, i -> i+1)    // parallel
>                    .parallel()
>                    .filter(i -> isPrime(i))
>                    .forEachUntil(i -> {
>                        chm.put(i, true);
>                    }, () -> System.currentTimeMillis() >= start+num);
>
> On a 4-core Q6600 system, in a fixed amount of time, the parallel
> version gathered ~3x as many primes.
>
> In terms of being able to perform useful computations on infinite
> streams, this seems a pretty attractive price-performer; lower spec
> and implementation complexity, and covers many of the use cases which
> would otherwise be impractical to attack with the stream approach.
>
>
>
> On 12/28/2012 11:20 AM, Brian Goetz wrote:
>
> I've been working through some alternatives for cancellation support in
> infinite streams.  Looking to gather some use case background to help
> evaluate the alternatives.
>
> In the serial case, the "gate" approach works fine -- after some
> criteria transpires, stop sending elements downstream.  The pipeline
> flushes the elements it has, and completes early.
>
> In the parallel unordered case, the gate approach similarly works fine
> -- after the cancelation criteria occurs, no new splits are created, and
> existing splits dispense no more elements.  The computation similarly
> quiesces after elements currently being processed are completed,
> possibly along with any up-tree merging to combine results.
>
> It is the parallel ordered case that is tricky.  Supposing we partition
> a stream into
>   (a1,a2,a3), (a4,a5,a6)
>
> And suppose further we happen to be processing a5 when the bell goes
> off.  Do we want to wait for all a_i, i<5, to finish before letting the
> computation quiesce?
>
> My gut says: for the things we intend to cancel, most of them will be
> order-insensitive anyway.  Things like:
>
>  - Find the best possible move after thinking for 5 seconds
>  - Find the first solution that is better than X
>  - Gather solutions until we have 100 of them
>
> I believe the key use case for cancelation here will be when we are
> chewing on potentially infinite streams of events (probably backed by
> IO) where we want to chew until we're asked to shut down, and want to
> get as much parallelism as we can cheaply.  Which suggests to me the
> intersection between order-sensitive stream pipelines and cancelable
> stream pipelines is going to be pretty small indeed.
>
> Anyone want to add to this model of use cases for cancelation?
>

From sam at sampullara.com  Sat Feb  9 11:55:04 2013
From: sam at sampullara.com (Sam Pullara)
Date: Sat, 9 Feb 2013 11:55:04 -0800
Subject: Internal and External truncation conditions
In-Reply-To: <CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
Message-ID: <CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>

Let's say you only want to process lines until it matches a regex.
Here is one way you could try to implement it:

AtomicBoolean done = new AtomicBoolean();
Pattern regex = ...;
try (BufferedReader r = Files.newBufferedReader(path,
Charset.defaultCharset())) {
  r.lines().forEachUntil( (line) -> {
    if (!done.get()) {
      if (regex.matcher(line).matches()) {
        done.set(true);
      } else {
        ...process the line...
      }
    }
  }, done::get);
}

In the parallel case this completely breaks down since the lines can
be processed out of order. Gate would have to ensure that didn't
happen.

Pattern regex = ...;
try (BufferedReader r = Files.newBufferedReader(path,
Charset.defaultCharset())) {
  r.lines().gate(() -> !regex.matcher(line).matches()).forEach((line)
-> .. process line ..);
}

If we wanted to cancel the operation asynchronously we would just call
Stream.close() and that should work with any stream. In the current
case we can't stop a stream from continuing to execute without
explicitly adding a forEachUntil() at the end of it with a condition
variable that we then change out of band. Also, since there may be
reduction operations in the middle, that may not even stop all of
those operations from completing. This can be especially bad for
things that should timeout.

Sam

On Sat, Feb 9, 2013 at 11:36 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
> I haven't used either of these.  If I wanted to create an example I'd
> probably start with a stream of lines() from a BufferedReader, and then tack
> on a use case.
>
> try (BufferedReader r = Files.newBufferedReader(path,
> Charset.defaultCharset())) {
>   r.lines().forEachUntil(...);
> }
>
>
> Do you have something specific in mind?
>
>
> On Sat, Feb 9, 2013 at 11:26 AM, Sam Pullara <sam at sampullara.com> wrote:
>>
>> Now that we are further along, I wanted to bring this up again. I
>> don't think that forEachUntil is sufficient for handling internal and
>> external conditions that should truncate stream processing. I've also
>> looked at CloseableStream and that doesn't appear to help since it
>> isn't possible to wrap a Stream (say an infinite stream) with a
>> CloseableStream and get the necessary semantics of cancellation. Also,
>> other APIs that don't consider that you might give them a
>> CloseableStream will likely still give you back a Stream thus losing
>> the semantics.
>>
>> Is everyone else happy with forEachUntil and CloseableStream?
>>
>> Sam
>>
>> ---------- Forwarded message ----------
>> From: Sam Pullara <sam at sampullara.com>
>> Date: Mon, Dec 31, 2012 at 8:34 AM
>> Subject: Re: Cancelation -- use cases
>> To: Brian Goetz <brian.goetz at oracle.com>
>> Cc: "lambda-libs-spec-experts at openjdk.java.net"
>> <lambda-libs-spec-experts at openjdk.java.net>
>>
>> I think we are conflating two things with this solution and it doesn't
>> work for them in my mind. Here is what I would like the solution to
>> cover:
>>
>> - External conditions (cancellation, cleanup)
>> - Internal conditions (gating based on count, elements and results)
>>
>> The first one may be the only one that works in the parallel case. It
>> should likely be implemented with .close() on stream that would stop
>> the stream as soon as possible. This would be useful for things like
>> timeouts. Kind of like calling close on an inputstream in the middle
>> of reading it. The other one I think is necessary and hard to
>> implement correctly with the parallel case. For instance I would like
>> to say:
>>
>> stream.gate(e -> e < 10).forEach(e -> ?)
>>
>> OR
>>
>> stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the
>> number of the current element
>>
>> That should give me every element in the stream until an element isn't
>> < 10 and then stop processing elements. Further, there should be some
>> way for the stream source to be notified that we are done consuming it
>> in case it is of unknown length or consumes resources. That would be
>> more like (assuming we add a Runnable call to Timer):
>>
>> Stream stream = ?.
>> new Timer().schedule(() -> stream.close(), 5000);
>> stream.forEach(e -> ?.);
>>
>> OR
>>
>> stream.forEach(e -> try { ? } catch() { stream.close() } );
>>
>> Sadly, the first gate() case doesn't work well when parallelized. I'm
>> willing to just specify what the behavior is for that case to get it
>> into the API. For example, I would probably say something like "the
>> gate will need to return false once per split to stop processing". In
>> either of these cases I think one of the motivations needs to be that
>> the stream may be using resources and we need to tell the source that
>> we are done consuming it. For example, if the stream is sourced from a
>> file, database or even a large amount of memory there should be a
>> notification mechanism for doneness that will allow those resources to
>> be returned before the stream is exhausted. To that end I think that
>> Stream should implement AutoCloseable but overridden with no checked
>> exception.
>>
>> interface Stream<T> implements AutoCloseable {
>>   /**
>>    * Closes this stream and releases any system resources associated
>>    * with it. If the stream is already closed then invoking this
>>    * method has no effect. Close is automatically called when the
>>    * stream is exhausted. After this is called, no further elements
>>    * will be processed by the stream but currently processing elements
>>    * will complete normally. Calling other methods on a closed stream will
>>    * produce IllegalStateExceptions.
>>    */
>>   void close();
>>
>>   /**
>>    * When the continueProcessing function returns false, no further
>>    * elements will be processed after the gate. In the parallel stream
>>    * case no further elements will be processed in the current split.
>>    */
>>   Stream<T> gate(Function<T, Boolean> until);
>>
>>   /**
>>    * As gate with the addition of the current element number.
>>    */
>>   Stream<T> gate(BiFunction<T, Integer, Boolean> until);
>> }
>>
>> This API avoids a lot of side effects that forEachUntil would require
>> implement these use cases.
>>
>> Sam
>>
>> On Dec 30, 2012, at 7:53 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>>
>> Here's a lower-complexity version of cancel, that still satisfies (in
>> series or in parallel) use cases like the following:
>>
>> >   - Find the best possible move after thinking for 5 seconds
>> >   - Find the first solution that is better than X
>> >   - Gather solutions until we have 100 of them
>>
>> without bringing in the complexity or time/space overhead of dealing
>> with encounter order.
>>
>> Since the forEach() operation works exclusively on the basis of
>> temporal/arrival order rather than spatial/encounter order (elements
>> are passed to the lambda in whatever order they are available, in
>> whatever thread they are available), we could make a canceling variant
>> of forEach:
>>
>>  .forEachUntil(Block<T> sink, BooleanSupplier until)
>>
>> Here, there is no confusion about what happens in the ordered case, no
>> need to buffer elements, etc.  Elements flow into the block until the
>> termination condition transpires, at which point there are no more
>> splits and existing splits dispense no more elements.
>>
>> I implemented this (it was trivial) and wrote a simple test program to
>> calculate primes sequentially and in parallel, counting how many could
>> be calculated in a fixed amount of time, starting from an infinite
>> generator and filtering out composites:
>>
>>            Streams.iterate(from, i -> i + 1)  // sequential
>>                    .filter(i -> isPrime(i))
>>                    .forEachUntil(i -> {
>>                        chm.put(i, true);
>>                    }, () -> System.currentTimeMillis() >= start+num);
>>
>> vs
>>
>>            Streams.iterate(from, i -> i+1)    // parallel
>>                    .parallel()
>>                    .filter(i -> isPrime(i))
>>                    .forEachUntil(i -> {
>>                        chm.put(i, true);
>>                    }, () -> System.currentTimeMillis() >= start+num);
>>
>> On a 4-core Q6600 system, in a fixed amount of time, the parallel
>> version gathered ~3x as many primes.
>>
>> In terms of being able to perform useful computations on infinite
>> streams, this seems a pretty attractive price-performer; lower spec
>> and implementation complexity, and covers many of the use cases which
>> would otherwise be impractical to attack with the stream approach.
>>
>>
>>
>> On 12/28/2012 11:20 AM, Brian Goetz wrote:
>>
>> I've been working through some alternatives for cancellation support in
>> infinite streams.  Looking to gather some use case background to help
>> evaluate the alternatives.
>>
>> In the serial case, the "gate" approach works fine -- after some
>> criteria transpires, stop sending elements downstream.  The pipeline
>> flushes the elements it has, and completes early.
>>
>> In the parallel unordered case, the gate approach similarly works fine
>> -- after the cancelation criteria occurs, no new splits are created, and
>> existing splits dispense no more elements.  The computation similarly
>> quiesces after elements currently being processed are completed,
>> possibly along with any up-tree merging to combine results.
>>
>> It is the parallel ordered case that is tricky.  Supposing we partition
>> a stream into
>>   (a1,a2,a3), (a4,a5,a6)
>>
>> And suppose further we happen to be processing a5 when the bell goes
>> off.  Do we want to wait for all a_i, i<5, to finish before letting the
>> computation quiesce?
>>
>> My gut says: for the things we intend to cancel, most of them will be
>> order-insensitive anyway.  Things like:
>>
>>  - Find the best possible move after thinking for 5 seconds
>>  - Find the first solution that is better than X
>>  - Gather solutions until we have 100 of them
>>
>> I believe the key use case for cancelation here will be when we are
>> chewing on potentially infinite streams of events (probably backed by
>> IO) where we want to chew until we're asked to shut down, and want to
>> get as much parallelism as we can cheaply.  Which suggests to me the
>> intersection between order-sensitive stream pipelines and cancelable
>> stream pipelines is going to be pretty small indeed.
>>
>> Anyone want to add to this model of use cases for cancelation?
>
>

From forax at univ-mlv.fr  Sat Feb  9 15:24:39 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 10 Feb 2013 00:24:39 +0100
Subject: Internal and External truncation conditions
In-Reply-To: <CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
Message-ID: <5116DAB7.70809@univ-mlv.fr>

if forEachUntil takes a function that return a boolean, it's easy.

try (BufferedReader r = Files.newBufferedReader(path, Charset.defaultCharset())) {
   return r.lines().parallel().forEachWhile(element -> {
      if (regex.matcher(line).matches()) {
        return false;
      }
      ...process the line
      return true;
    }
}

cheers,
R?mi

On 02/09/2013 08:55 PM, Sam Pullara wrote:
> Let's say you only want to process lines until it matches a regex.
> Here is one way you could try to implement it:
>
> AtomicBoolean done = new AtomicBoolean();
> Pattern regex = ...;
> try (BufferedReader r = Files.newBufferedReader(path,
> Charset.defaultCharset())) {
>    r.lines().forEachUntil( (line) -> {
>      if (!done.get()) {
>        if (regex.matcher(line).matches()) {
>          done.set(true);
>        } else {
>          ...process the line...
>        }
>      }
>    }, done::get);
> }
>
> In the parallel case this completely breaks down since the lines can
> be processed out of order. Gate would have to ensure that didn't
> happen.
>
> Pattern regex = ...;
> try (BufferedReader r = Files.newBufferedReader(path,
> Charset.defaultCharset())) {
>    r.lines().gate(() -> !regex.matcher(line).matches()).forEach((line)
> -> .. process line ..);
> }
>
> If we wanted to cancel the operation asynchronously we would just call
> Stream.close() and that should work with any stream. In the current
> case we can't stop a stream from continuing to execute without
> explicitly adding a forEachUntil() at the end of it with a condition
> variable that we then change out of band. Also, since there may be
> reduction operations in the middle, that may not even stop all of
> those operations from completing. This can be especially bad for
> things that should timeout.
>
> Sam
>
> On Sat, Feb 9, 2013 at 11:36 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
>> I haven't used either of these.  If I wanted to create an example I'd
>> probably start with a stream of lines() from a BufferedReader, and then tack
>> on a use case.
>>
>> try (BufferedReader r = Files.newBufferedReader(path,
>> Charset.defaultCharset())) {
>>    r.lines().forEachUntil(...);
>> }
>>
>>
>> Do you have something specific in mind?
>>
>>
>> On Sat, Feb 9, 2013 at 11:26 AM, Sam Pullara <sam at sampullara.com> wrote:
>>> Now that we are further along, I wanted to bring this up again. I
>>> don't think that forEachUntil is sufficient for handling internal and
>>> external conditions that should truncate stream processing. I've also
>>> looked at CloseableStream and that doesn't appear to help since it
>>> isn't possible to wrap a Stream (say an infinite stream) with a
>>> CloseableStream and get the necessary semantics of cancellation. Also,
>>> other APIs that don't consider that you might give them a
>>> CloseableStream will likely still give you back a Stream thus losing
>>> the semantics.
>>>
>>> Is everyone else happy with forEachUntil and CloseableStream?
>>>
>>> Sam
>>>
>>> ---------- Forwarded message ----------
>>> From: Sam Pullara <sam at sampullara.com>
>>> Date: Mon, Dec 31, 2012 at 8:34 AM
>>> Subject: Re: Cancelation -- use cases
>>> To: Brian Goetz <brian.goetz at oracle.com>
>>> Cc: "lambda-libs-spec-experts at openjdk.java.net"
>>> <lambda-libs-spec-experts at openjdk.java.net>
>>>
>>> I think we are conflating two things with this solution and it doesn't
>>> work for them in my mind. Here is what I would like the solution to
>>> cover:
>>>
>>> - External conditions (cancellation, cleanup)
>>> - Internal conditions (gating based on count, elements and results)
>>>
>>> The first one may be the only one that works in the parallel case. It
>>> should likely be implemented with .close() on stream that would stop
>>> the stream as soon as possible. This would be useful for things like
>>> timeouts. Kind of like calling close on an inputstream in the middle
>>> of reading it. The other one I think is necessary and hard to
>>> implement correctly with the parallel case. For instance I would like
>>> to say:
>>>
>>> stream.gate(e -> e < 10).forEach(e -> ?)
>>>
>>> OR
>>>
>>> stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the
>>> number of the current element
>>>
>>> That should give me every element in the stream until an element isn't
>>> < 10 and then stop processing elements. Further, there should be some
>>> way for the stream source to be notified that we are done consuming it
>>> in case it is of unknown length or consumes resources. That would be
>>> more like (assuming we add a Runnable call to Timer):
>>>
>>> Stream stream = ?.
>>> new Timer().schedule(() -> stream.close(), 5000);
>>> stream.forEach(e -> ?.);
>>>
>>> OR
>>>
>>> stream.forEach(e -> try { ? } catch() { stream.close() } );
>>>
>>> Sadly, the first gate() case doesn't work well when parallelized. I'm
>>> willing to just specify what the behavior is for that case to get it
>>> into the API. For example, I would probably say something like "the
>>> gate will need to return false once per split to stop processing". In
>>> either of these cases I think one of the motivations needs to be that
>>> the stream may be using resources and we need to tell the source that
>>> we are done consuming it. For example, if the stream is sourced from a
>>> file, database or even a large amount of memory there should be a
>>> notification mechanism for doneness that will allow those resources to
>>> be returned before the stream is exhausted. To that end I think that
>>> Stream should implement AutoCloseable but overridden with no checked
>>> exception.
>>>
>>> interface Stream<T> implements AutoCloseable {
>>>    /**
>>>     * Closes this stream and releases any system resources associated
>>>     * with it. If the stream is already closed then invoking this
>>>     * method has no effect. Close is automatically called when the
>>>     * stream is exhausted. After this is called, no further elements
>>>     * will be processed by the stream but currently processing elements
>>>     * will complete normally. Calling other methods on a closed stream will
>>>     * produce IllegalStateExceptions.
>>>     */
>>>    void close();
>>>
>>>    /**
>>>     * When the continueProcessing function returns false, no further
>>>     * elements will be processed after the gate. In the parallel stream
>>>     * case no further elements will be processed in the current split.
>>>     */
>>>    Stream<T> gate(Function<T, Boolean> until);
>>>
>>>    /**
>>>     * As gate with the addition of the current element number.
>>>     */
>>>    Stream<T> gate(BiFunction<T, Integer, Boolean> until);
>>> }
>>>
>>> This API avoids a lot of side effects that forEachUntil would require
>>> implement these use cases.
>>>
>>> Sam
>>>
>>> On Dec 30, 2012, at 7:53 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>>>
>>> Here's a lower-complexity version of cancel, that still satisfies (in
>>> series or in parallel) use cases like the following:
>>>
>>>>    - Find the best possible move after thinking for 5 seconds
>>>>    - Find the first solution that is better than X
>>>>    - Gather solutions until we have 100 of them
>>> without bringing in the complexity or time/space overhead of dealing
>>> with encounter order.
>>>
>>> Since the forEach() operation works exclusively on the basis of
>>> temporal/arrival order rather than spatial/encounter order (elements
>>> are passed to the lambda in whatever order they are available, in
>>> whatever thread they are available), we could make a canceling variant
>>> of forEach:
>>>
>>>   .forEachUntil(Block<T> sink, BooleanSupplier until)
>>>
>>> Here, there is no confusion about what happens in the ordered case, no
>>> need to buffer elements, etc.  Elements flow into the block until the
>>> termination condition transpires, at which point there are no more
>>> splits and existing splits dispense no more elements.
>>>
>>> I implemented this (it was trivial) and wrote a simple test program to
>>> calculate primes sequentially and in parallel, counting how many could
>>> be calculated in a fixed amount of time, starting from an infinite
>>> generator and filtering out composites:
>>>
>>>             Streams.iterate(from, i -> i + 1)  // sequential
>>>                     .filter(i -> isPrime(i))
>>>                     .forEachUntil(i -> {
>>>                         chm.put(i, true);
>>>                     }, () -> System.currentTimeMillis() >= start+num);
>>>
>>> vs
>>>
>>>             Streams.iterate(from, i -> i+1)    // parallel
>>>                     .parallel()
>>>                     .filter(i -> isPrime(i))
>>>                     .forEachUntil(i -> {
>>>                         chm.put(i, true);
>>>                     }, () -> System.currentTimeMillis() >= start+num);
>>>
>>> On a 4-core Q6600 system, in a fixed amount of time, the parallel
>>> version gathered ~3x as many primes.
>>>
>>> In terms of being able to perform useful computations on infinite
>>> streams, this seems a pretty attractive price-performer; lower spec
>>> and implementation complexity, and covers many of the use cases which
>>> would otherwise be impractical to attack with the stream approach.
>>>
>>>
>>>
>>> On 12/28/2012 11:20 AM, Brian Goetz wrote:
>>>
>>> I've been working through some alternatives for cancellation support in
>>> infinite streams.  Looking to gather some use case background to help
>>> evaluate the alternatives.
>>>
>>> In the serial case, the "gate" approach works fine -- after some
>>> criteria transpires, stop sending elements downstream.  The pipeline
>>> flushes the elements it has, and completes early.
>>>
>>> In the parallel unordered case, the gate approach similarly works fine
>>> -- after the cancelation criteria occurs, no new splits are created, and
>>> existing splits dispense no more elements.  The computation similarly
>>> quiesces after elements currently being processed are completed,
>>> possibly along with any up-tree merging to combine results.
>>>
>>> It is the parallel ordered case that is tricky.  Supposing we partition
>>> a stream into
>>>    (a1,a2,a3), (a4,a5,a6)
>>>
>>> And suppose further we happen to be processing a5 when the bell goes
>>> off.  Do we want to wait for all a_i, i<5, to finish before letting the
>>> computation quiesce?
>>>
>>> My gut says: for the things we intend to cancel, most of them will be
>>> order-insensitive anyway.  Things like:
>>>
>>>   - Find the best possible move after thinking for 5 seconds
>>>   - Find the first solution that is better than X
>>>   - Gather solutions until we have 100 of them
>>>
>>> I believe the key use case for cancelation here will be when we are
>>> chewing on potentially infinite streams of events (probably backed by
>>> IO) where we want to chew until we're asked to shut down, and want to
>>> get as much parallelism as we can cheaply.  Which suggests to me the
>>> intersection between order-sensitive stream pipelines and cancelable
>>> stream pipelines is going to be pretty small indeed.
>>>
>>> Anyone want to add to this model of use cases for cancelation?
>>


From sam at sampullara.com  Sat Feb  9 15:49:12 2013
From: sam at sampullara.com (Sam Pullara)
Date: Sat, 9 Feb 2013 15:49:12 -0800
Subject: Internal and External truncation conditions
In-Reply-To: <5116DAB7.70809@univ-mlv.fr>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr>
Message-ID: <CAMUF1SkqypV=q=g_njr+F1oYw9Noqiu-kDy9=-iZy-DQ-GU8ig@mail.gmail.com>

I think the point of forEachUntil is that Brian doesn't want to do
this as it has issues when parallelized. I think that it is an
important enough use case that we handle it anyway. This is better
than forEachUtil though.

Sam

On Sat, Feb 9, 2013 at 3:24 PM, Remi Forax <forax at univ-mlv.fr> wrote:
> if forEachUntil takes a function that return a boolean, it's easy.
>
>
> try (BufferedReader r = Files.newBufferedReader(path,
> Charset.defaultCharset())) {
>   return r.lines().parallel().forEachWhile(element -> {
>      if (regex.matcher(line).matches()) {
>        return false;
>      }
>      ...process the line
>      return true;
>    }
> }
>
> cheers,
> R?mi
>
>
> On 02/09/2013 08:55 PM, Sam Pullara wrote:
>>
>> Let's say you only want to process lines until it matches a regex.
>> Here is one way you could try to implement it:
>>
>> AtomicBoolean done = new AtomicBoolean();
>> Pattern regex = ...;
>> try (BufferedReader r = Files.newBufferedReader(path,
>> Charset.defaultCharset())) {
>>    r.lines().forEachUntil( (line) -> {
>>      if (!done.get()) {
>>        if (regex.matcher(line).matches()) {
>>          done.set(true);
>>        } else {
>>          ...process the line...
>>        }
>>      }
>>    }, done::get);
>> }
>>
>> In the parallel case this completely breaks down since the lines can
>> be processed out of order. Gate would have to ensure that didn't
>> happen.
>>
>> Pattern regex = ...;
>> try (BufferedReader r = Files.newBufferedReader(path,
>> Charset.defaultCharset())) {
>>    r.lines().gate(() -> !regex.matcher(line).matches()).forEach((line)
>> -> .. process line ..);
>> }
>>
>> If we wanted to cancel the operation asynchronously we would just call
>> Stream.close() and that should work with any stream. In the current
>> case we can't stop a stream from continuing to execute without
>> explicitly adding a forEachUntil() at the end of it with a condition
>> variable that we then change out of band. Also, since there may be
>> reduction operations in the middle, that may not even stop all of
>> those operations from completing. This can be especially bad for
>> things that should timeout.
>>
>> Sam
>>
>> On Sat, Feb 9, 2013 at 11:36 AM, Joe Bowbeer <joe.bowbeer at gmail.com>
>> wrote:
>>>
>>> I haven't used either of these.  If I wanted to create an example I'd
>>> probably start with a stream of lines() from a BufferedReader, and then
>>> tack
>>> on a use case.
>>>
>>> try (BufferedReader r = Files.newBufferedReader(path,
>>> Charset.defaultCharset())) {
>>>    r.lines().forEachUntil(...);
>>> }
>>>
>>>
>>> Do you have something specific in mind?
>>>
>>>
>>> On Sat, Feb 9, 2013 at 11:26 AM, Sam Pullara <sam at sampullara.com> wrote:
>>>>
>>>> Now that we are further along, I wanted to bring this up again. I
>>>> don't think that forEachUntil is sufficient for handling internal and
>>>> external conditions that should truncate stream processing. I've also
>>>> looked at CloseableStream and that doesn't appear to help since it
>>>> isn't possible to wrap a Stream (say an infinite stream) with a
>>>> CloseableStream and get the necessary semantics of cancellation. Also,
>>>> other APIs that don't consider that you might give them a
>>>> CloseableStream will likely still give you back a Stream thus losing
>>>> the semantics.
>>>>
>>>> Is everyone else happy with forEachUntil and CloseableStream?
>>>>
>>>> Sam
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Sam Pullara <sam at sampullara.com>
>>>> Date: Mon, Dec 31, 2012 at 8:34 AM
>>>> Subject: Re: Cancelation -- use cases
>>>> To: Brian Goetz <brian.goetz at oracle.com>
>>>> Cc: "lambda-libs-spec-experts at openjdk.java.net"
>>>> <lambda-libs-spec-experts at openjdk.java.net>
>>>>
>>>> I think we are conflating two things with this solution and it doesn't
>>>> work for them in my mind. Here is what I would like the solution to
>>>> cover:
>>>>
>>>> - External conditions (cancellation, cleanup)
>>>> - Internal conditions (gating based on count, elements and results)
>>>>
>>>> The first one may be the only one that works in the parallel case. It
>>>> should likely be implemented with .close() on stream that would stop
>>>> the stream as soon as possible. This would be useful for things like
>>>> timeouts. Kind of like calling close on an inputstream in the middle
>>>> of reading it. The other one I think is necessary and hard to
>>>> implement correctly with the parallel case. For instance I would like
>>>> to say:
>>>>
>>>> stream.gate(e -> e < 10).forEach(e -> ?)
>>>>
>>>> OR
>>>>
>>>> stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the
>>>> number of the current element
>>>>
>>>> That should give me every element in the stream until an element isn't
>>>> < 10 and then stop processing elements. Further, there should be some
>>>> way for the stream source to be notified that we are done consuming it
>>>> in case it is of unknown length or consumes resources. That would be
>>>> more like (assuming we add a Runnable call to Timer):
>>>>
>>>> Stream stream = ?.
>>>> new Timer().schedule(() -> stream.close(), 5000);
>>>> stream.forEach(e -> ?.);
>>>>
>>>> OR
>>>>
>>>> stream.forEach(e -> try { ? } catch() { stream.close() } );
>>>>
>>>> Sadly, the first gate() case doesn't work well when parallelized. I'm
>>>> willing to just specify what the behavior is for that case to get it
>>>> into the API. For example, I would probably say something like "the
>>>> gate will need to return false once per split to stop processing". In
>>>> either of these cases I think one of the motivations needs to be that
>>>> the stream may be using resources and we need to tell the source that
>>>> we are done consuming it. For example, if the stream is sourced from a
>>>> file, database or even a large amount of memory there should be a
>>>> notification mechanism for doneness that will allow those resources to
>>>> be returned before the stream is exhausted. To that end I think that
>>>> Stream should implement AutoCloseable but overridden with no checked
>>>> exception.
>>>>
>>>> interface Stream<T> implements AutoCloseable {
>>>>    /**
>>>>     * Closes this stream and releases any system resources associated
>>>>     * with it. If the stream is already closed then invoking this
>>>>     * method has no effect. Close is automatically called when the
>>>>     * stream is exhausted. After this is called, no further elements
>>>>     * will be processed by the stream but currently processing elements
>>>>     * will complete normally. Calling other methods on a closed stream
>>>> will
>>>>     * produce IllegalStateExceptions.
>>>>     */
>>>>    void close();
>>>>
>>>>    /**
>>>>     * When the continueProcessing function returns false, no further
>>>>     * elements will be processed after the gate. In the parallel stream
>>>>     * case no further elements will be processed in the current split.
>>>>     */
>>>>    Stream<T> gate(Function<T, Boolean> until);
>>>>
>>>>    /**
>>>>     * As gate with the addition of the current element number.
>>>>     */
>>>>    Stream<T> gate(BiFunction<T, Integer, Boolean> until);
>>>> }
>>>>
>>>> This API avoids a lot of side effects that forEachUntil would require
>>>> implement these use cases.
>>>>
>>>> Sam
>>>>
>>>> On Dec 30, 2012, at 7:53 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>>>>
>>>> Here's a lower-complexity version of cancel, that still satisfies (in
>>>> series or in parallel) use cases like the following:
>>>>
>>>>>    - Find the best possible move after thinking for 5 seconds
>>>>>    - Find the first solution that is better than X
>>>>>    - Gather solutions until we have 100 of them
>>>>
>>>> without bringing in the complexity or time/space overhead of dealing
>>>> with encounter order.
>>>>
>>>> Since the forEach() operation works exclusively on the basis of
>>>> temporal/arrival order rather than spatial/encounter order (elements
>>>> are passed to the lambda in whatever order they are available, in
>>>> whatever thread they are available), we could make a canceling variant
>>>> of forEach:
>>>>
>>>>   .forEachUntil(Block<T> sink, BooleanSupplier until)
>>>>
>>>> Here, there is no confusion about what happens in the ordered case, no
>>>> need to buffer elements, etc.  Elements flow into the block until the
>>>> termination condition transpires, at which point there are no more
>>>> splits and existing splits dispense no more elements.
>>>>
>>>> I implemented this (it was trivial) and wrote a simple test program to
>>>> calculate primes sequentially and in parallel, counting how many could
>>>> be calculated in a fixed amount of time, starting from an infinite
>>>> generator and filtering out composites:
>>>>
>>>>             Streams.iterate(from, i -> i + 1)  // sequential
>>>>                     .filter(i -> isPrime(i))
>>>>                     .forEachUntil(i -> {
>>>>                         chm.put(i, true);
>>>>                     }, () -> System.currentTimeMillis() >= start+num);
>>>>
>>>> vs
>>>>
>>>>             Streams.iterate(from, i -> i+1)    // parallel
>>>>                     .parallel()
>>>>                     .filter(i -> isPrime(i))
>>>>                     .forEachUntil(i -> {
>>>>                         chm.put(i, true);
>>>>                     }, () -> System.currentTimeMillis() >= start+num);
>>>>
>>>> On a 4-core Q6600 system, in a fixed amount of time, the parallel
>>>> version gathered ~3x as many primes.
>>>>
>>>> In terms of being able to perform useful computations on infinite
>>>> streams, this seems a pretty attractive price-performer; lower spec
>>>> and implementation complexity, and covers many of the use cases which
>>>> would otherwise be impractical to attack with the stream approach.
>>>>
>>>>
>>>>
>>>> On 12/28/2012 11:20 AM, Brian Goetz wrote:
>>>>
>>>> I've been working through some alternatives for cancellation support in
>>>> infinite streams.  Looking to gather some use case background to help
>>>> evaluate the alternatives.
>>>>
>>>> In the serial case, the "gate" approach works fine -- after some
>>>> criteria transpires, stop sending elements downstream.  The pipeline
>>>> flushes the elements it has, and completes early.
>>>>
>>>> In the parallel unordered case, the gate approach similarly works fine
>>>> -- after the cancelation criteria occurs, no new splits are created, and
>>>> existing splits dispense no more elements.  The computation similarly
>>>> quiesces after elements currently being processed are completed,
>>>> possibly along with any up-tree merging to combine results.
>>>>
>>>> It is the parallel ordered case that is tricky.  Supposing we partition
>>>> a stream into
>>>>    (a1,a2,a3), (a4,a5,a6)
>>>>
>>>> And suppose further we happen to be processing a5 when the bell goes
>>>> off.  Do we want to wait for all a_i, i<5, to finish before letting the
>>>> computation quiesce?
>>>>
>>>> My gut says: for the things we intend to cancel, most of them will be
>>>> order-insensitive anyway.  Things like:
>>>>
>>>>   - Find the best possible move after thinking for 5 seconds
>>>>   - Find the first solution that is better than X
>>>>   - Gather solutions until we have 100 of them
>>>>
>>>> I believe the key use case for cancelation here will be when we are
>>>> chewing on potentially infinite streams of events (probably backed by
>>>> IO) where we want to chew until we're asked to shut down, and want to
>>>> get as much parallelism as we can cheaply.  Which suggests to me the
>>>> intersection between order-sensitive stream pipelines and cancelable
>>>> stream pipelines is going to be pretty small indeed.
>>>>
>>>> Anyone want to add to this model of use cases for cancelation?
>>>
>>>
>

From zhong.j.yu at gmail.com  Sat Feb  9 20:25:12 2013
From: zhong.j.yu at gmail.com (Zhong Yu)
Date: Sat, 9 Feb 2013 22:25:12 -0600
Subject: Internal and External truncation conditions
In-Reply-To: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
Message-ID: <CACuKZqFpQyzNP07CvMtKXJXXs8+nzU19XGLgourvNFaLqU4rxQ@mail.gmail.com>

Based on my own use cases, code that needs forEachUntil() usually
intends to process just enough elements to produce a result, for
example, a lexer scans a char stream until it yields a token. In that
sense forEachUntil() is really an aggregator for *some* elements. We
may have a method in the form of

    interface Stream<T>

        <R> R scan(Function<T,R> scanner)

The scanner is usually stateful. Elements are fed to the scanner,
until it returns a non-null value; that value is the return value of
scan(). If end of stream is reached before scanner returns non-null,
scan() returns null. A scanner may need to react to EOF event, the
application can design an EOF sentinel of type T.

In the parallel case, scanner must be thread-safe; if it returns
non-null for one split, it should return non-null for all splits at
around the same time; one of the non-null values is chosen arbitrarily
as the result of scan().

If null sentinel is too distasteful, scanner can return Optional<R>;
or it can yield result into a Consumer<R> sink.

Examples:

    Collection<Int> primes = ints.parallel().scan( gather primes till xxx );

    Paragraph para = lines.scan( gather lines till an empty line or EOF );

scan() is only intended for part of the stream. To turn the whole
stream into another stream, say a line stream into a paragraph stream,
flatMap(FlatMapper) should work just fine.

Zhong Yu


On Sat, Feb 9, 2013 at 1:26 PM, Sam Pullara <sam at sampullara.com> wrote:
> Now that we are further along, I wanted to bring this up again. I
> don't think that forEachUntil is sufficient for handling internal and
> external conditions that should truncate stream processing. I've also
> looked at CloseableStream and that doesn't appear to help since it
> isn't possible to wrap a Stream (say an infinite stream) with a
> CloseableStream and get the necessary semantics of cancellation. Also,
> other APIs that don't consider that you might give them a
> CloseableStream will likely still give you back a Stream thus losing
> the semantics.
>
> Is everyone else happy with forEachUntil and CloseableStream?
>
> Sam
>
> ---------- Forwarded message ----------
> From: Sam Pullara <sam at sampullara.com>
> Date: Mon, Dec 31, 2012 at 8:34 AM
> Subject: Re: Cancelation -- use cases
> To: Brian Goetz <brian.goetz at oracle.com>
> Cc: "lambda-libs-spec-experts at openjdk.java.net"
> <lambda-libs-spec-experts at openjdk.java.net>
>
> I think we are conflating two things with this solution and it doesn't
> work for them in my mind. Here is what I would like the solution to
> cover:
>
> - External conditions (cancellation, cleanup)
> - Internal conditions (gating based on count, elements and results)
>
> The first one may be the only one that works in the parallel case. It
> should likely be implemented with .close() on stream that would stop
> the stream as soon as possible. This would be useful for things like
> timeouts. Kind of like calling close on an inputstream in the middle
> of reading it. The other one I think is necessary and hard to
> implement correctly with the parallel case. For instance I would like
> to say:
>
> stream.gate(e -> e < 10).forEach(e -> ?)
>
> OR
>
> stream.gate( (e, i) -> e < 10 || i > 10).forEach(e -> ?) // i is the
> number of the current element
>
> That should give me every element in the stream until an element isn't
> < 10 and then stop processing elements. Further, there should be some
> way for the stream source to be notified that we are done consuming it
> in case it is of unknown length or consumes resources. That would be
> more like (assuming we add a Runnable call to Timer):
>
> Stream stream = ?.
> new Timer().schedule(() -> stream.close(), 5000);
> stream.forEach(e -> ?.);
>
> OR
>
> stream.forEach(e -> try { ? } catch() { stream.close() } );
>
> Sadly, the first gate() case doesn't work well when parallelized. I'm
> willing to just specify what the behavior is for that case to get it
> into the API. For example, I would probably say something like "the
> gate will need to return false once per split to stop processing". In
> either of these cases I think one of the motivations needs to be that
> the stream may be using resources and we need to tell the source that
> we are done consuming it. For example, if the stream is sourced from a
> file, database or even a large amount of memory there should be a
> notification mechanism for doneness that will allow those resources to
> be returned before the stream is exhausted. To that end I think that
> Stream should implement AutoCloseable but overridden with no checked
> exception.
>
> interface Stream<T> implements AutoCloseable {
>   /**
>    * Closes this stream and releases any system resources associated
>    * with it. If the stream is already closed then invoking this
>    * method has no effect. Close is automatically called when the
>    * stream is exhausted. After this is called, no further elements
>    * will be processed by the stream but currently processing elements
>    * will complete normally. Calling other methods on a closed stream will
>    * produce IllegalStateExceptions.
>    */
>   void close();
>
>   /**
>    * When the continueProcessing function returns false, no further
>    * elements will be processed after the gate. In the parallel stream
>    * case no further elements will be processed in the current split.
>    */
>   Stream<T> gate(Function<T, Boolean> until);
>
>   /**
>    * As gate with the addition of the current element number.
>    */
>   Stream<T> gate(BiFunction<T, Integer, Boolean> until);
> }
>
> This API avoids a lot of side effects that forEachUntil would require
> implement these use cases.
>
> Sam
>
> On Dec 30, 2012, at 7:53 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
> Here's a lower-complexity version of cancel, that still satisfies (in
> series or in parallel) use cases like the following:
>
>>   - Find the best possible move after thinking for 5 seconds
>>   - Find the first solution that is better than X
>>   - Gather solutions until we have 100 of them
>
> without bringing in the complexity or time/space overhead of dealing
> with encounter order.
>
> Since the forEach() operation works exclusively on the basis of
> temporal/arrival order rather than spatial/encounter order (elements
> are passed to the lambda in whatever order they are available, in
> whatever thread they are available), we could make a canceling variant
> of forEach:
>
>  .forEachUntil(Block<T> sink, BooleanSupplier until)
>
> Here, there is no confusion about what happens in the ordered case, no
> need to buffer elements, etc.  Elements flow into the block until the
> termination condition transpires, at which point there are no more
> splits and existing splits dispense no more elements.
>
> I implemented this (it was trivial) and wrote a simple test program to
> calculate primes sequentially and in parallel, counting how many could
> be calculated in a fixed amount of time, starting from an infinite
> generator and filtering out composites:
>
>            Streams.iterate(from, i -> i + 1)  // sequential
>                    .filter(i -> isPrime(i))
>                    .forEachUntil(i -> {
>                        chm.put(i, true);
>                    }, () -> System.currentTimeMillis() >= start+num);
>
> vs
>
>            Streams.iterate(from, i -> i+1)    // parallel
>                    .parallel()
>                    .filter(i -> isPrime(i))
>                    .forEachUntil(i -> {
>                        chm.put(i, true);
>                    }, () -> System.currentTimeMillis() >= start+num);
>
> On a 4-core Q6600 system, in a fixed amount of time, the parallel
> version gathered ~3x as many primes.
>
> In terms of being able to perform useful computations on infinite
> streams, this seems a pretty attractive price-performer; lower spec
> and implementation complexity, and covers many of the use cases which
> would otherwise be impractical to attack with the stream approach.
>
>
>
> On 12/28/2012 11:20 AM, Brian Goetz wrote:
>
> I've been working through some alternatives for cancellation support in
> infinite streams.  Looking to gather some use case background to help
> evaluate the alternatives.
>
> In the serial case, the "gate" approach works fine -- after some
> criteria transpires, stop sending elements downstream.  The pipeline
> flushes the elements it has, and completes early.
>
> In the parallel unordered case, the gate approach similarly works fine
> -- after the cancelation criteria occurs, no new splits are created, and
> existing splits dispense no more elements.  The computation similarly
> quiesces after elements currently being processed are completed,
> possibly along with any up-tree merging to combine results.
>
> It is the parallel ordered case that is tricky.  Supposing we partition
> a stream into
>   (a1,a2,a3), (a4,a5,a6)
>
> And suppose further we happen to be processing a5 when the bell goes
> off.  Do we want to wait for all a_i, i<5, to finish before letting the
> computation quiesce?
>
> My gut says: for the things we intend to cancel, most of them will be
> order-insensitive anyway.  Things like:
>
>  - Find the best possible move after thinking for 5 seconds
>  - Find the first solution that is better than X
>  - Gather solutions until we have 100 of them
>
> I believe the key use case for cancelation here will be when we are
> chewing on potentially infinite streams of events (probably backed by
> IO) where we want to chew until we're asked to shut down, and want to
> get as much parallelism as we can cheaply.  Which suggests to me the
> intersection between order-sensitive stream pipelines and cancelable
> stream pipelines is going to be pretty small indeed.
>
> Anyone want to add to this model of use cases for cancelation?

From zhong.j.yu at gmail.com  Sat Feb  9 20:59:10 2013
From: zhong.j.yu at gmail.com (Zhong Yu)
Date: Sat, 9 Feb 2013 22:59:10 -0600
Subject: Internal and External truncation conditions
In-Reply-To: <CACuKZqFpQyzNP07CvMtKXJXXs8+nzU19XGLgourvNFaLqU4rxQ@mail.gmail.com>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CACuKZqFpQyzNP07CvMtKXJXXs8+nzU19XGLgourvNFaLqU4rxQ@mail.gmail.com>
Message-ID: <CACuKZqEq0VRiTWRm+jB13uEaYZC1D0=2JMM49xMTuZbaiK5_zA@mail.gmail.com>

On Sat, Feb 9, 2013 at 10:25 PM, Zhong Yu <zhong.j.yu at gmail.com> wrote:
> Based on my own use cases, code that needs forEachUntil() usually
> intends to process just enough elements to produce a result, for
> example, a lexer scans a char stream until it yields a token. In that
> sense forEachUntil() is really an aggregator for *some* elements. We
> may have a method in the form of
>
>     interface Stream<T>
>
>         <R> R scan(Function<T,R> scanner)
>
> The scanner is usually stateful. Elements are fed to the scanner,
> until it returns a non-null value; that value is the return value of
> scan(). If end of stream is reached before scanner returns non-null,
> scan() returns null. A scanner may need to react to EOF event, the
> application can design an EOF sentinel of type T.
>
> In the parallel case, scanner must be thread-safe; if it returns
> non-null for one split, it should return non-null for all splits at
> around the same time; one of the non-null values is chosen arbitrarily
> as the result of scan().
>
> If null sentinel is too distasteful, scanner can return Optional<R>;
> or it can yield result into a Consumer<R> sink.
>
> Examples:
>
>     Collection<Int> primes = ints.parallel().scan( gather primes till xxx );
>
>     Paragraph para = lines.scan( gather lines till an empty line or EOF );
>
> scan() is only intended for part of the stream. To turn the whole
> stream into another stream, say a line stream into a paragraph stream,
> flatMap(FlatMapper) should work just fine.


Actually, scan() can be defined in term of

    flatMap(FlatMapper mapper).findFirst()

the mapper is stateful; it gathers some elements then yields a result
to the sink.

The scan() method, though providing the same functionality, is more
clear about the intention of the programmer.

Zhong Yu

From dl at cs.oswego.edu  Sun Feb 10 05:12:18 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sun, 10 Feb 2013 08:12:18 -0500
Subject: Internal and External truncation conditions
In-Reply-To: <5116DAB7.70809@univ-mlv.fr>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr>
Message-ID: <51179CB2.3000502@cs.oswego.edu>

On 02/09/13 18:24, Remi Forax wrote:
> if forEachUntil takes a function that return a boolean, it's easy.
>
> try (BufferedReader r = Files.newBufferedReader(path, Charset.defaultCharset())) {
>    return r.lines().parallel().forEachWhile(element -> {
>       if (regex.matcher(line).matches()) {
>         return false;
>       }
>       ...process the line
>       return true;
>     }
> }
>

Which then becomes a variant of what I do in ConcurrentHashMap
search{InParallel,Sequentially}, that applies to not only this
but several other usage contexts:

     /**
      * Returns a non-null result from applying the given search
      * function on each (key, value), or null if none.  Upon
      * success, further element processing is suppressed and the
      * results of any other parallel invocations of the search
      * function are ignored.
      *
      * @param searchFunction a function returning a non-null
      * result on success, else null
      * @return a non-null result from applying the given search
      * function on each (key, value), or null if none
      */

You'd use this here with a function that processed if
a match (returning null) else returning the first non-match.
Or rework in any of a couple of ways to similar effect.

This works well in CHM because of its nullness policy.
Which allows only this single method to serve as the basis
for all possible short-circuit/cancel applications.
It is so handy when nulls cannot be actual elements
that it might be worth supporting instead of forEachUntil?
People using it  would need to ensure non-null elements.
Just a thought.

While I'm at it:

Sam seems to be asking for asynchronous cancellation of bulk
operations. I can't get myself to appreciate the utility of
doing this. JDK/j.u.c supports several other ways (especially
including the upcoming CompletableFutures) to carefully yet
relatively conveniently arrange/manage cancellation, especially
in IO-related contexts in which they most often arise. None
of them explicitly address bulk computations (although any
of them can do a bulk computation within a task). This is
a feature, not a bug. If you are processing lots
of elements, then only you know the responsiveness vs
overhead tradeoffs of checking for async cancel status.

Requiring that all Stream bulk computations like reduce
continuously check for async cancel status between each
per-element operation is unlikely to satisfy anyone at all,
yet seems to be the only defensible option if we were to
support it.

-Doug


From forax at univ-mlv.fr  Sun Feb 10 05:46:08 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 10 Feb 2013 14:46:08 +0100
Subject: Internal and External truncation conditions
In-Reply-To: <51179CB2.3000502@cs.oswego.edu>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu>
Message-ID: <5117A4A0.5000208@univ-mlv.fr>

On 02/10/2013 02:12 PM, Doug Lea wrote:
> On 02/09/13 18:24, Remi Forax wrote:
>> if forEachUntil takes a function that return a boolean, it's easy.
>>
>> try (BufferedReader r = Files.newBufferedReader(path, 
>> Charset.defaultCharset())) {
>>    return r.lines().parallel().forEachWhile(element -> {
>>       if (regex.matcher(line).matches()) {
>>         return false;
>>       }
>>       ...process the line
>>       return true;
>>     }
>> }
>>
>
> Which then becomes a variant of what I do in ConcurrentHashMap
> search{InParallel,Sequentially}, that applies to not only this
> but several other usage contexts:
>
>     /**
>      * Returns a non-null result from applying the given search
>      * function on each (key, value), or null if none.  Upon
>      * success, further element processing is suppressed and the
>      * results of any other parallel invocations of the search
>      * function are ignored.
>      *
>      * @param searchFunction a function returning a non-null
>      * result on success, else null
>      * @return a non-null result from applying the given search
>      * function on each (key, value), or null if none
>      */
>
> You'd use this here with a function that processed if
> a match (returning null) else returning the first non-match.
> Or rework in any of a couple of ways to similar effect.
>
> This works well in CHM because of its nullness policy.
> Which allows only this single method to serve as the basis
> for all possible short-circuit/cancel applications.
> It is so handy when nulls cannot be actual elements
> that it might be worth supporting instead of forEachUntil?
> People using it  would need to ensure non-null elements.
> Just a thought.

yes, findFirst and forEachWhile/forEachUntil are the same operation from 
the implementation point of view if you have a value (not necessarily 
null) that says NO_VALUE.
Now, I think it's an implementation detail and that from the user point 
of view we should provide them both.

>
> -Doug
>
>

R?mi


From forax at univ-mlv.fr  Sun Feb 10 05:47:39 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 10 Feb 2013 14:47:39 +0100
Subject: Internal and External truncation conditions
In-Reply-To: <5117A3D8.9040509@cs.oswego.edu>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu>
	<5117A3D8.9040509@cs.oswego.edu>
Message-ID: <5117A4FB.7010609@univ-mlv.fr>

On 02/10/2013 02:42 PM, Doug Lea wrote:
> On 02/10/13 08:12, Doug Lea wrote:
>
>> Requiring that all Stream bulk computations like reduce
>> continuously check for async cancel status between each
>> per-element operation is unlikely to satisfy anyone at all,
>> yet seems to be the only defensible option if we were to
>> support it.
>>
>
> Actually, we already support it.
> Any per-element lambda supplied to any Stream method can
> itself do any kind of async cancel check itself, and throw
> an exception rather than returning a result.
>
> Case closed?

No, throwing an exception when the VM thinks that can it can escape is 
really slow.

>
> -Doug
>
>
>

R?mi


From dl at cs.oswego.edu  Sun Feb 10 06:02:21 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sun, 10 Feb 2013 09:02:21 -0500
Subject: Internal and External truncation conditions
In-Reply-To: <5117A4FB.7010609@univ-mlv.fr>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu>
	<5117A3D8.9040509@cs.oswego.edu> <5117A4FB.7010609@univ-mlv.fr>
Message-ID: <5117A86D.8090707@cs.oswego.edu>

On 02/10/13 08:47, Remi Forax wrote:
> On 02/10/2013 02:42 PM, Doug Lea wrote:

>> Any per-element lambda supplied to any Stream method can
>> itself do any kind of async cancel check itself, and throw
>> an exception rather than returning a result.
>>
>> Case closed?
>
> No, throwing an exception when the VM thinks that can it can escape is really slow.
>

That's my point exactly! If you want to slow down bulk ops
for the sake of responsiveness, then you should be aware of
the tradeoffs. In practice, fine-grained cancel-checks
are rarely worthwhile (you'd often finish 10 times faster,
and thus usually not need to cancel, without the checks).
But it should be the user's decision, not ours.
Otherwise, we cannot internally arrange/support
cancellation any faster than users can, but would
penalize ALL users for the sake of those who need it.

-Doug


From forax at univ-mlv.fr  Sun Feb 10 06:28:19 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 10 Feb 2013 15:28:19 +0100
Subject: Internal and External truncation conditions
In-Reply-To: <5117A86D.8090707@cs.oswego.edu>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu>
	<5117A3D8.9040509@cs.oswego.edu> <5117A4FB.7010609@univ-mlv.fr>
	<5117A86D.8090707@cs.oswego.edu>
Message-ID: <5117AE83.30206@univ-mlv.fr>

On 02/10/2013 03:02 PM, Doug Lea wrote:
> On 02/10/13 08:47, Remi Forax wrote:
>> On 02/10/2013 02:42 PM, Doug Lea wrote:
>
>>> Any per-element lambda supplied to any Stream method can
>>> itself do any kind of async cancel check itself, and throw
>>> an exception rather than returning a result.
>>>
>>> Case closed?
>>
>> No, throwing an exception when the VM thinks that can it can escape 
>> is really slow.
>>
>
> That's my point exactly! If you want to slow down bulk ops
> for the sake of responsiveness, then you should be aware of
> the tradeoffs. In practice, fine-grained cancel-checks
> are rarely worthwhile (you'd often finish 10 times faster,
> and thus usually not need to cancel, without the checks).
> But it should be the user's decision, not ours.
> Otherwise, we cannot internally arrange/support
> cancellation any faster than users can, but would
> penalize ALL users for the sake of those who need it.

yes, you can it's exactly what j.l.i.SwitchPoint does.
Note that we can't transform the whole pipeline to a method handle tree 
because we have no loopy method handle now, but if we have that, you can 
create a method handle tree corresponding to the pipeline and when the 
code will be JITed, with the new lambda form it will, the check will 
disappear and if a user calls cancl, the JITed code will be trashed and 
the execution will go back into the interpreter that will do the check.

>
> -Doug
>
>

R?mi


From dl at cs.oswego.edu  Sun Feb 10 06:42:56 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sun, 10 Feb 2013 09:42:56 -0500
Subject: Internal and External truncation conditions
In-Reply-To: <5117AE83.30206@univ-mlv.fr>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu>
	<5117A3D8.9040509@cs.oswego.edu> <5117A4FB.7010609@univ-mlv.fr>
	<5117A86D.8090707@cs.oswego.edu> <5117AE83.30206@univ-mlv.fr>
Message-ID: <5117B1F0.6060806@cs.oswego.edu>

On 02/10/13 09:28, Remi Forax wrote:

> yes, you can it's exactly what j.l.i.SwitchPoint does.
> Note that we can't transform the whole pipeline to a method handle tree because
> we have no loopy method handle now, but if we have that, you can create a method
> handle tree corresponding to the pipeline and when the code will be JITed, with
> the new lambda form it will, the check will disappear and if a user calls cancl,
> the JITed code will be trashed and the execution will go back into the
> interpreter that will do the check.
>

Which amounts to, at best, an approximation of the rare-trap mechanics
that would be used for explicit check in user code if the handles
are fully resolved?

-Doug


From dl at cs.oswego.edu  Sun Feb 10 05:42:48 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sun, 10 Feb 2013 08:42:48 -0500
Subject: Internal and External truncation conditions
In-Reply-To: <51179CB2.3000502@cs.oswego.edu>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu>
Message-ID: <5117A3D8.9040509@cs.oswego.edu>

On 02/10/13 08:12, Doug Lea wrote:

> Requiring that all Stream bulk computations like reduce
> continuously check for async cancel status between each
> per-element operation is unlikely to satisfy anyone at all,
> yet seems to be the only defensible option if we were to
> support it.
>

Actually, we already support it.
Any per-element lambda supplied to any Stream method can
itself do any kind of async cancel check itself, and throw
an exception rather than returning a result.

Case closed?

-Doug


From zhong.j.yu at gmail.com  Sun Feb 10 08:30:33 2013
From: zhong.j.yu at gmail.com (Zhong Yu)
Date: Sun, 10 Feb 2013 10:30:33 -0600
Subject: Internal and External truncation conditions
In-Reply-To: <51179CB2.3000502@cs.oswego.edu>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu>
Message-ID: <CACuKZqHGT+-9=S2=RWovTEk9iFJyVzXG-xLpYutf8=YERBxP9A@mail.gmail.com>

On Sun, Feb 10, 2013 at 7:12 AM, Doug Lea <dl at cs.oswego.edu> wrote:
> On 02/09/13 18:24, Remi Forax wrote:
>>
>> if forEachUntil takes a function that return a boolean, it's easy.
>>
>> try (BufferedReader r = Files.newBufferedReader(path,
>> Charset.defaultCharset())) {
>>    return r.lines().parallel().forEachWhile(element -> {
>>       if (regex.matcher(line).matches()) {
>>         return false;
>>       }
>>       ...process the line
>>       return true;
>>     }
>> }
>>
>
> Which then becomes a variant of what I do in ConcurrentHashMap
> search{InParallel,Sequentially}, that applies to not only this
> but several other usage contexts:
>
>     /**
>      * Returns a non-null result from applying the given search
>      * function on each (key, value), or null if none.  Upon
>      * success, further element processing is suppressed and the
>      * results of any other parallel invocations of the search
>      * function are ignored.
>      *
>      * @param searchFunction a function returning a non-null
>      * result on success, else null
>      * @return a non-null result from applying the given search
>      * function on each (key, value), or null if none
>      */
>
> You'd use this here with a function that processed if
> a match (returning null) else returning the first non-match.
> Or rework in any of a couple of ways to similar effect.
>
> This works well in CHM because of its nullness policy.
> Which allows only this single method to serve as the basis
> for all possible short-circuit/cancel applications.
> It is so handy when nulls cannot be actual elements
> that it might be worth supporting instead of forEachUntil?
> People using it  would need to ensure non-null elements.
> Just a thought.

null is fine if we use Optional

    Optional<R> search(Function<T,Optional<R>)


>
> While I'm at it:
>
> Sam seems to be asking for asynchronous cancellation of bulk
> operations. I can't get myself to appreciate the utility of
> doing this. JDK/j.u.c supports several other ways (especially
> including the upcoming CompletableFutures) to carefully yet
> relatively conveniently arrange/manage cancellation, especially
> in IO-related contexts in which they most often arise. None
> of them explicitly address bulk computations (although any
> of them can do a bulk computation within a task). This is
> a feature, not a bug. If you are processing lots
> of elements, then only you know the responsiveness vs
> overhead tradeoffs of checking for async cancel status.
>
> Requiring that all Stream bulk computations like reduce
> continuously check for async cancel status between each
> per-element operation is unlikely to satisfy anyone at all,
> yet seems to be the only defensible option if we were to
> support it.
>
> -Doug
>
>

From forax at univ-mlv.fr  Sun Feb 10 09:25:20 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 10 Feb 2013 18:25:20 +0100
Subject: Spliterator.tryAdvance
Message-ID: <5117D800.9090708@univ-mlv.fr>

Playing a little bit with how findFirst/forEachUntil can be implemented 
on top of a Spliterator,
I think that tryAdvance should be changed to be able to return a value 
produced in the middle of the consumer taken by tryAdvance.

I would like to have tryAdvance to be like this:

   /**
    * Sentinel value used by tryAdvance to signal that there is no more 
element.
    */
   public static final Object END = new Object();

    /**
      * If a remaining element exists, performs the given action on it,
      * returning the result of the function, otherwise returns {@code END}.
      *
      * @param action The action to perform.
      * @return {@code END} if no remaining elements existed
      * upon entry to this method, else the return value of the action.
      */
    Object tryAdvance(Function<? super T, ?> action);

and forEach is a little bit uglier:
     Function<? super T, ?> action = element -> { 
consumer.accept(element); return null; }
     do {} while(tryAdvance(action) != END);

with that, there is no need to use a side value to express the fact that 
we have already found the resulting value because we can return it has 
the return value of tryAdvance.

cheers,
R?mi


From forax at univ-mlv.fr  Sun Feb 10 09:33:24 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 10 Feb 2013 18:33:24 +0100
Subject: Internal and External truncation conditions
In-Reply-To: <5117B1F0.6060806@cs.oswego.edu>
References: <CAMUF1SmOdNqjz_V3snn2F_BtOZ2_hsBMdt78nx19m7N7AgtJYw@mail.gmail.com>
	<CAHzJPEpFwT_VMtK31Xu3+7C3FpB+b6kRxLneJWLM0dqb7acU2Q@mail.gmail.com>
	<CAMUF1SmsV7_Ho452uiaf3c_SUuUmov2bpLs3Ovtgf95ab7nz1A@mail.gmail.com>
	<5116DAB7.70809@univ-mlv.fr> <51179CB2.3000502@cs.oswego.edu>
	<5117A3D8.9040509@cs.oswego.edu> <5117A4FB.7010609@univ-mlv.fr>
	<5117A86D.8090707@cs.oswego.edu> <5117AE83.30206@univ-mlv.fr>
	<5117B1F0.6060806@cs.oswego.edu>
Message-ID: <5117D9E4.6040008@univ-mlv.fr>

On 02/10/2013 03:42 PM, Doug Lea wrote:
> On 02/10/13 09:28, Remi Forax wrote:
>
>> yes, you can it's exactly what j.l.i.SwitchPoint does.
>> Note that we can't transform the whole pipeline to a method handle 
>> tree because
>> we have no loopy method handle now, but if we have that, you can 
>> create a method
>> handle tree corresponding to the pipeline and when the code will be 
>> JITed, with
>> the new lambda form it will, the check will disappear and if a user 
>> calls cancl,
>> the JITed code will be trashed and the execution will go back into the
>> interpreter that will do the check.
>>
>
> Which amounts to, at best, an approximation of the rare-trap mechanics
> that would be used for explicit check in user code if the handles
> are fully resolved?

It depends what handles fully resolved mean and how the loopy method 
handle is implemented.
If it's implemented like OSR, there is a check when interpreting the 
code just before doing the backward jump. When JITed, there is no 
supplementary cost because the JIT has to insert a GC safepoint check (a 
read to a well known page) and the cancellation mechanism can re-use the 
very same read.

>
> -Doug
>
>

R?mi


From dl at cs.oswego.edu  Sun Feb 10 13:00:27 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sun, 10 Feb 2013 16:00:27 -0500
Subject: Spliterator.tryAdvance
In-Reply-To: <5117D800.9090708@univ-mlv.fr>
References: <5117D800.9090708@univ-mlv.fr>
Message-ID: <51180A6B.4020503@cs.oswego.edu>

On 02/10/13 12:25, Remi Forax wrote:
> Playing a little bit with how findFirst/forEachUntil can be implemented on top
> of a Spliterator,
> I think that tryAdvance should be changed to be able to return a value produced
> in the middle of the consumer taken by tryAdvance.

Brian and I spent a while on this theme, of only supporting
forEach and some variant of the search method I mentioned.
If we had nonnull-element guarantees, it would be an easier call:
just use CHM-like search. The primitive int/long/double
versions would  need a boxed return value but these could
sometimes be optimized away in practice. All in all seems
pretty good. But when you also allow nullable elements,
it means that every call is guaranteed to create a nuisance
object, which makes it less attractive than single-step
tryAdvance as the basic workhorse underlying a lot of bulk
computations.

>    /**
>     * Sentinel value used by tryAdvance to signal that there is no more element.
>     */
>    public static final Object END = new Object();

No can do. (Primitives.)

-Doug


From forax at univ-mlv.fr  Sun Feb 10 14:19:31 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 10 Feb 2013 23:19:31 +0100
Subject: Spliterator.tryAdvance
In-Reply-To: <51180A6B.4020503@cs.oswego.edu>
References: <5117D800.9090708@univ-mlv.fr> <51180A6B.4020503@cs.oswego.edu>
Message-ID: <51181CF3.9000708@univ-mlv.fr>

On 02/10/2013 10:00 PM, Doug Lea wrote:
> On 02/10/13 12:25, Remi Forax wrote:
>> Playing a little bit with how findFirst/forEachUntil can be 
>> implemented on top
>> of a Spliterator,
>> I think that tryAdvance should be changed to be able to return a 
>> value produced
>> in the middle of the consumer taken by tryAdvance.
>
> Brian and I spent a while on this theme, of only supporting
> forEach and some variant of the search method I mentioned.
> If we had nonnull-element guarantees, it would be an easier call:
> just use CHM-like search. The primitive int/long/double
> versions would  need a boxed return value but these could
> sometimes be optimized away in practice. All in all seems
> pretty good. But when you also allow nullable elements,
> it means that every call is guaranteed to create a nuisance
> object, which makes it less attractive than single-step
> tryAdvance as the basic workhorse underlying a lot of bulk
> computations.

You get it wrong, I think.
What I propose is that tryAdvance is not a search like operation but a 
single-step operation.
And when it calls the action at the end, the action is able to 
back-propagate a resulting value.
You can see it has a way to abstract the way a read on an input works, 
either you get the the number of bytes read or you get -1.
Here, with tryAdvance, either you get the return value of the action or 
you get END,
you still have to call tryAdvance several times to consume the whole stream.

About null, given that the return value can only comes from the 
consumer, the return value can be null or not depending on what the user 
has specified as action.

>
>>    /**
>>     * Sentinel value used by tryAdvance to signal that there is no 
>> more element.
>>     */
>>    public static final Object END = new Object();
>
> No can do. (Primitives.)

If the primitive value is one which is used in a reduce, yes, it's true, 
it can not do that, but anyway, you can not send the reduced value to 
the action too, or you need to create a new action at each call.
Otherwise, the stream API uses Optional as box, so the stream API 
already requires the implementation to box the value.

>
> -Doug
>
>

R?mi


From forax at univ-mlv.fr  Mon Feb 11 07:34:56 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Mon, 11 Feb 2013 16:34:56 +0100
Subject: Spliterator.tryAdvance
Message-ID: <51190FA0.3020601@univ-mlv.fr>

There is another point,
the specification should be relatex to allow tryAdvance to not always 
call the consumer taken as parameter.

If by example, I want to implements a Spliterator that filter the elements,
this implementation should be legal:

class FilterSpliterator implements Spliterator {
   private final Spliterator spliterator;
   private final Predicate predicate;

   public FilterSpliterator(Spliterator spliterator, Predicate predicate) {
      ....
   }

   public void tryAdvance(Consumer consumer) {
      spliterator.tryAdvance(element -> {
         if (predicate.test(element)) {
           consumer.accept(element);
         }
      });
   }
}

otherwise, you have to use a while loop around spliterator.tryAdvance but
because there is  no way to transmit the information that the element is 
accepted or not
(see my previous mail), you can not use a lambda here and you have to 
rely on an inner class.

cheers,
R?mi


From brian.goetz at oracle.com  Mon Feb 11 08:41:33 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 11 Feb 2013 11:41:33 -0500
Subject: Reducing reduce
Message-ID: <51191F3D.4090203@oracle.com>

Now that we've added all the shapes of map() to Stream (map to 
ref/int/long/double), and we've separated functional reduce (currently 
called reduce) from mutable reduce (currently called collect), I think 
that leaves room for taking out one of the reduce methods from Stream:

     <U> U reduce(U identity,
                  BiFunction<U, ? super T, U> accumulator,
                  BinaryOperator<U> reducer);

This is the one that confuses everyone anyway, and I don't think we need 
it any more.

The argument for having this form instead of discrete map+reduce are:
  - fused map+reduce reduces boxing
  - this three-arg form can also fold filtering into the accumulation

However, since we now have primitive-bearing map methods, and we can do 
filtering before and after the map, is this form really carrying its 
weight?  Specifically because people find it counterintuitive, we should 
consider dropping it and guiding people towards map+reduce.

For example, "sum of pages" over a stream of Documents is better written as:

   docs.map(Document::getPageCount).sum()

rather than

   docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)

The big place where we need three-arg reduce is when we're folding into 
a mutable store.  But that's now handled by collect().

Have I missed any use cases that would justify keeping this form?


From kevinb at google.com  Mon Feb 11 08:55:06 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Mon, 11 Feb 2013 08:55:06 -0800
Subject: Reducing reduce
In-Reply-To: <51191F3D.4090203@oracle.com>
References: <51191F3D.4090203@oracle.com>
Message-ID: <CAGKkBkvBZEccwV96eriw--7=6j0SyojY_t2353nHzey+pE_0Pw@mail.gmail.com>

+1, please drop it.


On Mon, Feb 11, 2013 at 8:41 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Now that we've added all the shapes of map() to Stream (map to
> ref/int/long/double), and we've separated functional reduce (currently
> called reduce) from mutable reduce (currently called collect), I think that
> leaves room for taking out one of the reduce methods from Stream:
>
>     <U> U reduce(U identity,
>                  BiFunction<U, ? super T, U> accumulator,
>                  BinaryOperator<U> reducer);
>
> This is the one that confuses everyone anyway, and I don't think we need
> it any more.
>
> The argument for having this form instead of discrete map+reduce are:
>  - fused map+reduce reduces boxing
>  - this three-arg form can also fold filtering into the accumulation
>
> However, since we now have primitive-bearing map methods, and we can do
> filtering before and after the map, is this form really carrying its
> weight?  Specifically because people find it counterintuitive, we should
> consider dropping it and guiding people towards map+reduce.
>
> For example, "sum of pages" over a stream of Documents is better written
> as:
>
>   docs.map(Document::**getPageCount).sum()
>
> rather than
>
>   docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>
> The big place where we need three-arg reduce is when we're folding into a
> mutable store.  But that's now handled by collect().
>
> Have I missed any use cases that would justify keeping this form?
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From joe.bowbeer at gmail.com  Mon Feb 11 08:57:41 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 11 Feb 2013 08:57:41 -0800
Subject: Reducing reduce
In-Reply-To: <51191F3D.4090203@oracle.com>
References: <51191F3D.4090203@oracle.com>
Message-ID: <CAHzJPEoKGsNJ8L0ZDF4CMqkEuEGY5iz7D79GKeYu3-CMy=6k7g@mail.gmail.com>

My parallel string-compare sample provides two implementations (below).

Will both of these survive your changes?

bitbucket.org/joebowbeer/stringcompare

  int compareMapReduce(String s1, String s2) {
    assert s1.length() == s2.length();
    return intRange(0, s1.length()).parallel()
        .map(i -> compare(s1.charAt(i), s2.charAt(i)))
        .reduce(0, (l, r) -> (l != 0) ? l : r);
  }
  int compareBoxedReduce(String s1, String s2) {
    assert s1.length() == s2.length();
    return intRange(0, s1.length()).parallel().boxed()
        .reduce(0, (l, i) -> (l != 0) ? l : compare(s1.charAt(i), s2.charAt(i)),
                   (l, r) -> (l != 0) ? l : r);
  }


The person who sold me the second form told me it would "burn less heat".
He said that I could optimize my map/reduce by having it "not even
calculate *f* if the left operand is nonzero, by combining the map and
reduce steps into a fold."

What is that person going to sell me now?

Joe


On Mon, Feb 11, 2013 at 8:41 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Now that we've added all the shapes of map() to Stream (map to
> ref/int/long/double), and we've separated functional reduce (currently
> called reduce) from mutable reduce (currently called collect), I think that
> leaves room for taking out one of the reduce methods from Stream:
>
>     <U> U reduce(U identity,
>                  BiFunction<U, ? super T, U> accumulator,
>                  BinaryOperator<U> reducer);
>
> This is the one that confuses everyone anyway, and I don't think we need
> it any more.
>
> The argument for having this form instead of discrete map+reduce are:
>  - fused map+reduce reduces boxing
>  - this three-arg form can also fold filtering into the accumulation
>
> However, since we now have primitive-bearing map methods, and we can do
> filtering before and after the map, is this form really carrying its
> weight?  Specifically because people find it counterintuitive, we should
> consider dropping it and guiding people towards map+reduce.
>
> For example, "sum of pages" over a stream of Documents is better written
> as:
>
>   docs.map(Document::**getPageCount).sum()
>
> rather than
>
>   docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>
> The big place where we need three-arg reduce is when we're folding into a
> mutable store.  But that's now handled by collect().
>
> Have I missed any use cases that would justify keeping this form?
>
>

From brian.goetz at oracle.com  Mon Feb 11 09:12:01 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 11 Feb 2013 12:12:01 -0500
Subject: Reducing reduce
In-Reply-To: <CAHzJPEoKGsNJ8L0ZDF4CMqkEuEGY5iz7D79GKeYu3-CMy=6k7g@mail.gmail.com>
References: <51191F3D.4090203@oracle.com>
	<CAHzJPEoKGsNJ8L0ZDF4CMqkEuEGY5iz7D79GKeYu3-CMy=6k7g@mail.gmail.com>
Message-ID: <51192661.2010501@oracle.com>

Thanks, Joe.  I knew I was missing some use cases.  This is definitely a 
case where the fused version is more efficient, since it can elide some 
work based on the previous comparison state.

On 2/11/2013 11:57 AM, Joe Bowbeer wrote:
> My parallel string-compare sample provides two implementations (below).
>
> Will both of these survive your changes?
>
> bitbucket.org/joebowbeer/stringcompare
> <http://bitbucket.org/joebowbeer/stringcompare>
>
>    int  compareMapReduce(String  s1,  String  s2)  {
>      assert  s1.length()  ==  s2.length();
>      return  intRange(0,  s1.length()).parallel()
>          .map(i  ->  compare(s1.charAt(i),  s2.charAt(i)))
>          .reduce(0,  (l,  r)  ->  (l  !=  0)  ?  l  :  r);
>    }
>
>    int  compareBoxedReduce(String  s1,  String  s2)  {
>      assert  s1.length()  ==  s2.length();
>      return  intRange(0,  s1.length()).parallel().boxed()
>          .reduce(0,  (l,  i)  ->  (l  !=  0)  ?  l  :  compare(s1.charAt(i),  s2.charAt(i)),
>                     (l,  r)  ->  (l  !=  0)  ?  l  :  r);
>    }
>
>
>
>
> The person who sold me the second form told me it would "burn less
> heat". He said that I could optimize my map/reduce by having it "not
> even calculate *f* if the left operand is nonzero, by combining the map
> and reduce steps into a fold."
>
> What is that person going to sell me now?
>
> Joe
>
>
> On Mon, Feb 11, 2013 at 8:41 AM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     Now that we've added all the shapes of map() to Stream (map to
>     ref/int/long/double), and we've separated functional reduce
>     (currently called reduce) from mutable reduce (currently called
>     collect), I think that leaves room for taking out one of the reduce
>     methods from Stream:
>
>          <U> U reduce(U identity,
>                       BiFunction<U, ? super T, U> accumulator,
>                       BinaryOperator<U> reducer);
>
>     This is the one that confuses everyone anyway, and I don't think we
>     need it any more.
>
>     The argument for having this form instead of discrete map+reduce are:
>       - fused map+reduce reduces boxing
>       - this three-arg form can also fold filtering into the accumulation
>
>     However, since we now have primitive-bearing map methods, and we can
>     do filtering before and after the map, is this form really carrying
>     its weight?  Specifically because people find it counterintuitive,
>     we should consider dropping it and guiding people towards map+reduce.
>
>     For example, "sum of pages" over a stream of Documents is better
>     written as:
>
>        docs.map(Document::__getPageCount).sum()
>
>     rather than
>
>        docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>
>     The big place where we need three-arg reduce is when we're folding
>     into a mutable store.  But that's now handled by collect().
>
>     Have I missed any use cases that would justify keeping this form?
>
>

From paul.sandoz at oracle.com  Mon Feb 11 09:43:09 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Mon, 11 Feb 2013 18:43:09 +0100
Subject: Reducing reduce
In-Reply-To: <51192661.2010501@oracle.com>
References: <51191F3D.4090203@oracle.com>
	<CAHzJPEoKGsNJ8L0ZDF4CMqkEuEGY5iz7D79GKeYu3-CMy=6k7g@mail.gmail.com>
	<51192661.2010501@oracle.com>
Message-ID: <26AD1E48-38C6-427B-AE75-8B83E440D95D@oracle.com>


On Feb 11, 2013, at 6:12 PM, Brian Goetz <brian.goetz at Oracle.COM> wrote:

> Thanks, Joe.  I knew I was missing some use cases.  This is definitely a case where the fused version is more efficient, since it can elide some work based on the previous comparison state.
> 

And efficiency wise it would be nice to avoid the boxed().

Paul.

> On 2/11/2013 11:57 AM, Joe Bowbeer wrote:
>> My parallel string-compare sample provides two implementations (below).
>> 
>> Will both of these survive your changes?
>> 
>> bitbucket.org/joebowbeer/stringcompare
>> <http://bitbucket.org/joebowbeer/stringcompare>
>> 
>>   int  compareMapReduce(String  s1,  String  s2)  {
>>     assert  s1.length()  ==  s2.length();
>>     return  intRange(0,  s1.length()).parallel()
>>         .map(i  ->  compare(s1.charAt(i),  s2.charAt(i)))
>>         .reduce(0,  (l,  r)  ->  (l  !=  0)  ?  l  :  r);
>>   }
>> 
>>   int  compareBoxedReduce(String  s1,  String  s2)  {
>>     assert  s1.length()  ==  s2.length();
>>     return  intRange(0,  s1.length()).parallel().boxed()
>>         .reduce(0,  (l,  i)  ->  (l  !=  0)  ?  l  :  compare(s1.charAt(i),  s2.charAt(i)),
>>                    (l,  r)  ->  (l  !=  0)  ?  l  :  r);
>>   }
>> 


From dl at cs.oswego.edu  Tue Feb 12 04:52:23 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Tue, 12 Feb 2013 07:52:23 -0500
Subject: Spliterator.tryAdvance
In-Reply-To: <51190FA0.3020601@univ-mlv.fr>
References: <51190FA0.3020601@univ-mlv.fr>
Message-ID: <511A3B07.1060906@cs.oswego.edu>

On 02/11/13 10:34, Remi Forax wrote:
> There is another point,
> the specification should be relatex to allow tryAdvance to not always call the
> consumer taken as parameter.

These are all the same issue in disguise (including the one
you mentioned that I didn't get :-)

The question is: Can you design Spliterators and/or
related leaf-computation-level support such that none
of the "basic" Stream ops require use of a lambda / inner class
that needs a mutable variable?

I took this path in ConcurrentHashMap
(see 
http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ConcurrentHashMap.html), 
resulting in 4 "basic" methods
(plus 3 more for primitives). I think it is the right solution
for CHM, but it cannot apply to Streams (CHM can rely on
nullness, and imposes requirement that users pre-fuse multiple
map and map-reduce ops, etc.)

And if you explore what it would take to do this for the
Stream API, it gets quickly out of hand -- at least
a dozen or so operations that every Collection, Map, or
other Stream/Spliterator source author would have to write.
Which led to the present solution of only requiring
forEach, trySplit, and tryAdvance.

-Doug

>
> If by example, I want to implements a Spliterator that filter the elements,
> this implementation should be legal:
>
> class FilterSpliterator implements Spliterator {
>    private final Spliterator spliterator;
>    private final Predicate predicate;
>
>    public FilterSpliterator(Spliterator spliterator, Predicate predicate) {
>       ....
>    }
>
>    public void tryAdvance(Consumer consumer) {
>       spliterator.tryAdvance(element -> {
>          if (predicate.test(element)) {
>            consumer.accept(element);
>          }
>       });
>    }
> }
>
> otherwise, you have to use a while loop around spliterator.tryAdvance but
> because there is  no way to transmit the information that the element is
> accepted or not
> (see my previous mail), you can not use a lambda here and you have to rely on an
> inner class.
>
> cheers,
> R?mi
>


From forax at univ-mlv.fr  Tue Feb 12 08:04:43 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 12 Feb 2013 17:04:43 +0100
Subject: Spliterator.tryAdvance
In-Reply-To: <511A3B07.1060906@cs.oswego.edu>
References: <51190FA0.3020601@univ-mlv.fr> <511A3B07.1060906@cs.oswego.edu>
Message-ID: <511A681B.2090104@univ-mlv.fr>

On 02/12/2013 01:52 PM, Doug Lea wrote:
> On 02/11/13 10:34, Remi Forax wrote:
>> There is another point,
>> the specification should be relatex to allow tryAdvance to not always 
>> call the
>> consumer taken as parameter.
>
> These are all the same issue in disguise (including the one
> you mentioned that I didn't get :-)

CHM.search is different from the proposed Spliterator.tryAdvance that 
returns a value because tryAdvance never consumes more than one element, 
just one in gfact. With that, you can use a well known value to say, 
I've done nothing and doesn't need to rely on "null" means nothing.

>
> The question is: Can you design Spliterators and/or
> related leaf-computation-level support such that none
> of the "basic" Stream ops require use of a lambda / inner class
> that needs a mutable variable?

yes !

>
> I took this path in ConcurrentHashMap
> (see 
> http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ConcurrentHashMap.html), 
> resulting in 4 "basic" methods
> (plus 3 more for primitives). I think it is the right solution
> for CHM, but it cannot apply to Streams (CHM can rely on
> nullness, and imposes requirement that users pre-fuse multiple
> map and map-reduce ops, etc.)
>
> And if you explore what it would take to do this for the
> Stream API, it gets quickly out of hand -- at least
> a dozen or so operations that every Collection, Map, or
> other Stream/Spliterator source author would have to write.
> Which led to the present solution of only requiring
> forEach, trySplit, and tryAdvance.

forEach, tryAvance that rely on side effect is not that bad for leaf of 
a forkjoin because by design forkjoin put variables into fields. But for 
a serial stream, forcing values to be stored in fields instead of on 
stack or in register is really a bad idea from a perf point of view.

pre-fuse operations try to tackle another problem, the fact that calling 
the lambda are megamorphic.
This can be solved in the stream by having dedicated path or generating 
one code for the whole pipeline.

Here, we are talking about the spliterator interface, no other interface,
IMO, the spliterator interface should have 3 operations:
   void forEach(Consumer),
   Object tryAdvance(Object, Function<T,Object>) that take an element 
and try to call function on it and
   U reduce(U, Function<T,U>)
   (and reduceInt/reduceLong/reduceDouble).

   /**
    * Sentinel value used by tryAdvance to signal that there is no more 
element.
    */
   public static final Object END = new Object();

    /**
      * If no remaining element exists, tryAdvance returns {@code END}.
      * If a remaining element exists, tryAdvance will try to performs 
the given action on it,
      * if the remaining element is filtered out, then the value noValue 
taken as parameter is returned
      * else the action is called with the remaining element.
      *
      * @param noValue a value returned is the element is filtered out
      * @param action the action to perform.
      * @return {@code END} if no remaining elements existed
      * upon entry to this method, else the return value of the action.
      */
    Object tryAdvance(Object noValue, Function<? super T, ?> action);

so yes, there are more method to implement but you can use lambdas fof 
most of the basic operations instead of using inner classes. So I'm not 
sure it's more cumbersome.

>
> -Doug

R?mi

>
>>
>> If by example, I want to implements a Spliterator that filter the 
>> elements,
>> this implementation should be legal:
>>
>> class FilterSpliterator implements Spliterator {
>>    private final Spliterator spliterator;
>>    private final Predicate predicate;
>>
>>    public FilterSpliterator(Spliterator spliterator, Predicate 
>> predicate) {
>>       ....
>>    }
>>
>>    public void tryAdvance(Consumer consumer) {
>>       spliterator.tryAdvance(element -> {
>>          if (predicate.test(element)) {
>>            consumer.accept(element);
>>          }
>>       });
>>    }
>> }
>>
>> otherwise, you have to use a while loop around spliterator.tryAdvance 
>> but
>> because there is  no way to transmit the information that the element is
>> accepted or not
>> (see my previous mail), you can not use a lambda here and you have to 
>> rely on an
>> inner class.
>>
>> cheers,
>> R?mi
>>
>
>


From brian.goetz at oracle.com  Tue Feb 12 10:41:51 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 12 Feb 2013 13:41:51 -0500
Subject: FlatMapper
Message-ID: <511A8CEF.8070800@oracle.com>

Here's where things have currently landed with FlatMapper -- this is a 
type in java.util.stream, with nested specializations.

Full bikeshed season is now open.  Are we OK with the name 
explodeInto()?  Is this general enough to join the ranks of Function and 
Supplier as top-level types in java.util.function?

@FunctionalInterface
public interface FlatMapper<T, U> {
     void explodeInto(T element, Consumer<U> sink);

     @FunctionalInterface
     interface ToInt<T> {
         void explodeInto(T element, IntConsumer sink);
     }

     @FunctionalInterface
     interface ToLong<T> {
         void explodeInto(T element, LongConsumer sink);
     }

     @FunctionalInterface
     interface ToDouble<T> {
         void explodeInto(T element, DoubleConsumer sink);
     }

     @FunctionalInterface
     interface OfIntToInt {
         void explodeInto(int element, IntConsumer sink);
     }

     @FunctionalInterface
     interface OfLongToLong {
         void explodeInto(long element, LongConsumer sink);
     }

     @FunctionalInterface
     interface OfDoubleToDouble {
         void explodeInto(double element, DoubleConsumer sink);
     }
}

From joe.bowbeer at gmail.com  Tue Feb 12 10:49:09 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Tue, 12 Feb 2013 10:49:09 -0800
Subject: FlatMapper
In-Reply-To: <511A8CEF.8070800@oracle.com>
References: <511A8CEF.8070800@oracle.com>
Message-ID: <CAHzJPErj1XQJrLMKqbjF5b7AbvyLAH3RXKV=8Z5KTQPKoXRtyQ@mail.gmail.com>

A verb that had some relation to "flat" would be nice - instead of explode,
which doesn't.

flatten
extrude
??
 On Feb 12, 2013 10:42 AM, "Brian Goetz" <brian.goetz at oracle.com> wrote:

> Here's where things have currently landed with FlatMapper -- this is a
> type in java.util.stream, with nested specializations.
>
> Full bikeshed season is now open.  Are we OK with the name explodeInto()?
>  Is this general enough to join the ranks of Function and Supplier as
> top-level types in java.util.function?
>
> @FunctionalInterface
> public interface FlatMapper<T, U> {
>     void explodeInto(T element, Consumer<U> sink);
>
>     @FunctionalInterface
>     interface ToInt<T> {
>         void explodeInto(T element, IntConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface ToLong<T> {
>         void explodeInto(T element, LongConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface ToDouble<T> {
>         void explodeInto(T element, DoubleConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface OfIntToInt {
>         void explodeInto(int element, IntConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface OfLongToLong {
>         void explodeInto(long element, LongConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface OfDoubleToDouble {
>         void explodeInto(double element, DoubleConsumer sink);
>     }
> }
>

From Donald.Raab at gs.com  Tue Feb 12 10:52:54 2013
From: Donald.Raab at gs.com (Raab, Donald)
Date: Tue, 12 Feb 2013 13:52:54 -0500
Subject: FlatMapper
In-Reply-To: <511A8CEF.8070800@oracle.com>
References: <511A8CEF.8070800@oracle.com>
Message-ID: <6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com>

Are we going to have a consistency issue with FlatMapper vs. Function?  For instance we have ToIntFunction, but not ToIntFlatMapper.  Instead we have FlatMapper.ToInt.

> -----Original Message-----
> From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda-
> libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Brian Goetz
> Sent: Tuesday, February 12, 2013 1:42 PM
> To: lambda-libs-spec-experts at openjdk.java.net
> Subject: FlatMapper
> 
> Here's where things have currently landed with FlatMapper -- this is a type
> in java.util.stream, with nested specializations.
> 
> Full bikeshed season is now open.  Are we OK with the name explodeInto()?
> Is this general enough to join the ranks of Function and Supplier as top-
> level types in java.util.function?
> 
> @FunctionalInterface
> public interface FlatMapper<T, U> {
>      void explodeInto(T element, Consumer<U> sink);
> 
>      @FunctionalInterface
>      interface ToInt<T> {
>          void explodeInto(T element, IntConsumer sink);
>      }
> 
>      @FunctionalInterface
>      interface ToLong<T> {
>          void explodeInto(T element, LongConsumer sink);
>      }
> 
>      @FunctionalInterface
>      interface ToDouble<T> {
>          void explodeInto(T element, DoubleConsumer sink);
>      }
> 
>      @FunctionalInterface
>      interface OfIntToInt {
>          void explodeInto(int element, IntConsumer sink);
>      }
> 
>      @FunctionalInterface
>      interface OfLongToLong {
>          void explodeInto(long element, LongConsumer sink);
>      }
> 
>      @FunctionalInterface
>      interface OfDoubleToDouble {
>          void explodeInto(double element, DoubleConsumer sink);
>      }
> }

From brian.goetz at oracle.com  Tue Feb 12 10:57:56 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 12 Feb 2013 13:57:56 -0500
Subject: FlatMapper
In-Reply-To: <CAHzJPErj1XQJrLMKqbjF5b7AbvyLAH3RXKV=8Z5KTQPKoXRtyQ@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com>
	<CAHzJPErj1XQJrLMKqbjF5b7AbvyLAH3RXKV=8Z5KTQPKoXRtyQ@mail.gmail.com>
Message-ID: <511A90B4.8030609@oracle.com>

Since the name doesn't appear in implementations often, we can use a 
more descriptive name, even if it is long, such as

   mapAndFlattenInto

Would that be better?

On 2/12/2013 1:49 PM, Joe Bowbeer wrote:
> A verb that had some relation to "flat" would be nice - instead of
> explode, which doesn't.
>
> flatten
> extrude
> ??
>
> On Feb 12, 2013 10:42 AM, "Brian Goetz" <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     Here's where things have currently landed with FlatMapper -- this is
>     a type in java.util.stream, with nested specializations.
>
>     Full bikeshed season is now open.  Are we OK with the name
>     explodeInto()?  Is this general enough to join the ranks of Function
>     and Supplier as top-level types in java.util.function?
>
>     @FunctionalInterface
>     public interface FlatMapper<T, U> {
>          void explodeInto(T element, Consumer<U> sink);
>
>          @FunctionalInterface
>          interface ToInt<T> {
>              void explodeInto(T element, IntConsumer sink);
>          }
>
>          @FunctionalInterface
>          interface ToLong<T> {
>              void explodeInto(T element, LongConsumer sink);
>          }
>
>          @FunctionalInterface
>          interface ToDouble<T> {
>              void explodeInto(T element, DoubleConsumer sink);
>          }
>
>          @FunctionalInterface
>          interface OfIntToInt {
>              void explodeInto(int element, IntConsumer sink);
>          }
>
>          @FunctionalInterface
>          interface OfLongToLong {
>              void explodeInto(long element, LongConsumer sink);
>          }
>
>          @FunctionalInterface
>          interface OfDoubleToDouble {
>              void explodeInto(double element, DoubleConsumer sink);
>          }
>     }
>

From brian.goetz at oracle.com  Tue Feb 12 10:59:58 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 12 Feb 2013 13:59:58 -0500
Subject: FlatMapper
In-Reply-To: <6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com>
References: <511A8CEF.8070800@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com>
Message-ID: <511A912E.1040205@oracle.com>

> Are we going to have a consistency issue with FlatMapper vs.
> Function?  For instance we have ToIntFunction, but not
> ToIntFlatMapper.  Instead we have FlatMapper.ToInt.


I think that's a function of where it lands, which is open for 
discussion.  Currently it is in java.util.stream, where the dominant 
convention is Foo.OfBar.  If we moved it to java.util.function, we'd 
"flatten" the namespace.

I currently lean towards JUS, since this does not seem as important a 
top-level type as Function, Predicate, or Supplier.  But such decisions 
often turn around to bite one.


From zhong.j.yu at gmail.com  Tue Feb 12 11:19:18 2013
From: zhong.j.yu at gmail.com (Zhong Yu)
Date: Tue, 12 Feb 2013 13:19:18 -0600
Subject: FlatMapper
In-Reply-To: <511A8CEF.8070800@oracle.com>
References: <511A8CEF.8070800@oracle.com>
Message-ID: <CACuKZqEY=mCW4tZ3KyMhw77z58ALqGHkqU5MNcRFQ8j1Jigcrw@mail.gmail.com>

One common use case is to map zero or more elements in stream A into
one element in stream B. People can, and will, use flatMap(FlatMapper)
to achieve that (with a side effecting mapper), even though it is the
opposite of what "flat map" was known for. I think the method could
use a more general name which is appropriate for both explode/implode.

Another use case is to aggregate *some* (not all) elements of a stream
to produce a result; it can be done by
stream.flatMap(FlatMapper).findFirst(). If this use case is common
enough it deserves a standalone method, say
aggregatePartially(FlatMapper). Now if FlatMapper is needed in other
places too, it could use a more general name; after all, it is pretty
much a normal function, except it inserts its result in a sink instead
of returning it.

Zhong Yu

On Tue, Feb 12, 2013 at 12:41 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> Here's where things have currently landed with FlatMapper -- this is a type
> in java.util.stream, with nested specializations.
>
> Full bikeshed season is now open.  Are we OK with the name explodeInto()?
> Is this general enough to join the ranks of Function and Supplier as
> top-level types in java.util.function?
>
> @FunctionalInterface
> public interface FlatMapper<T, U> {
>     void explodeInto(T element, Consumer<U> sink);
>
>     @FunctionalInterface
>     interface ToInt<T> {
>         void explodeInto(T element, IntConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface ToLong<T> {
>         void explodeInto(T element, LongConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface ToDouble<T> {
>         void explodeInto(T element, DoubleConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface OfIntToInt {
>         void explodeInto(int element, IntConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface OfLongToLong {
>         void explodeInto(long element, LongConsumer sink);
>     }
>
>     @FunctionalInterface
>     interface OfDoubleToDouble {
>         void explodeInto(double element, DoubleConsumer sink);
>     }
> }

From forax at univ-mlv.fr  Tue Feb 12 11:17:46 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 12 Feb 2013 20:17:46 +0100
Subject: FlatMapper
In-Reply-To: <511A912E.1040205@oracle.com>
References: <511A8CEF.8070800@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com>
	<511A912E.1040205@oracle.com>
Message-ID: <511A955A.8050208@univ-mlv.fr>

On 02/12/2013 07:59 PM, Brian Goetz wrote:
>> Are we going to have a consistency issue with FlatMapper vs.
>> Function?  For instance we have ToIntFunction, but not
>> ToIntFlatMapper.  Instead we have FlatMapper.ToInt.
>
>
> I think that's a function of where it lands, which is open for 
> discussion.  Currently it is in java.util.stream, where the dominant 
> convention is Foo.OfBar.  If we moved it to java.util.function, we'd 
> "flatten" the namespace.
>
> I currently lean towards JUS, since this does not seem as important a 
> top-level type as Function, Predicate, or Supplier. But such decisions 
> often turn around to bite one.
>

I've already said this, but nobody cares,
there is a usability issue with FlatMapper.ToInt, the very same that the 
one that was reported more than 10 years ago when java.awt.geom 
introduces classes like Point2D.Double [1][2].

cheers,
R?mi
[1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4198349
[2] https://forums.oracle.com/forums/thread.jspa?threadID=1665781


From forax at univ-mlv.fr  Tue Feb 12 11:45:23 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 12 Feb 2013 20:45:23 +0100
Subject: FlatMapper
In-Reply-To: <738F0591-0F5F-443B-8C76-5A5B69556A2B@gmail.com>
References: <511A8CEF.8070800@oracle.com>
	<6712820CB52CFB4D842561213A77C05404C3A894E0@GSCMAMP09EX.firmwide.corp.gs.com>
	<511A912E.1040205@oracle.com> <511A955A.8050208@univ-mlv.fr>
	<738F0591-0F5F-443B-8C76-5A5B69556A2B@gmail.com>
Message-ID: <511A9BD3.8070006@univ-mlv.fr>

On 02/12/2013 08:27 PM, Sam Pullara wrote:
> I don't get it. Why not just avoid importing Point2D.Double? This works fine:
>
> package spullara;
>
> import java.awt.geom.Point2D;
>
> public class Test {
>    public static void main(String[] args) {
>      Point2D.Double p = new Point2D.Double(10.0, 20.0);
>      double d = Double.parseDouble("123.45");
>    }
> }
>
> Sam

yes, just ...
but who read imports these days.

R?mi

>
> On Feb 12, 2013, at 11:17 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>
>> On 02/12/2013 07:59 PM, Brian Goetz wrote:
>>>> Are we going to have a consistency issue with FlatMapper vs.
>>>> Function?  For instance we have ToIntFunction, but not
>>>> ToIntFlatMapper.  Instead we have FlatMapper.ToInt.
>>>
>>> I think that's a function of where it lands, which is open for discussion.  Currently it is in java.util.stream, where the dominant convention is Foo.OfBar.  If we moved it to java.util.function, we'd "flatten" the namespace.
>>>
>>> I currently lean towards JUS, since this does not seem as important a top-level type as Function, Predicate, or Supplier. But such decisions often turn around to bite one.
>>>
>> I've already said this, but nobody cares,
>> there is a usability issue with FlatMapper.ToInt, the very same that the one that was reported more than 10 years ago when java.awt.geom introduces classes like Point2D.Double [1][2].
>>
>> cheers,
>> R?mi
>> [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4198349
>> [2] https://forums.oracle.com/forums/thread.jspa?threadID=1665781
>>


From forax at univ-mlv.fr  Thu Feb 14 07:53:23 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 14 Feb 2013 16:53:23 +0100
Subject: A small JSON parsing library
Message-ID: <511D0873.4000205@univ-mlv.fr>

Hi all,
just to see how it goes, I've written a small all-in-one-file library 
that parses a JSON files to an object tree defined by the user. The 
mapping is specified using lambdas, so it's compact but can burn your 
eyes :)

   https://github.com/forax/jsonjedi

The idea is that the JSON parser is triggered by the consumption of the 
corresponding stream,
so when the code do by example a forEach on a Stream, the parser parse 
the corresponding objects.
I've used an already existing push parser named json-simple for that.

During the development, I've found two main gotchas,
the first one is the scope rules of the lambda parameter,
I've already sent message about this rule, it seems that eah time I 
write a page of code, the compiler stop me because I tend to re-use the 
same variable name for the very same object.
In the example named Big [1], the builder of JSON schema is used 
recursively but I've to use a different names each time (builder, 
builder2, builder3).
We should really remove this stupid rule from the JLS and go back to the 
classical shadowing rules.

The second problem is that the implementation uses method handles to but 
due to the poor integration betweeen method handles and lambdas, I've 20 
lines of boilerplate and error prone code [2] which is moreover executed 
too eagerly.

cheers,
R?mi

[1] https://github.com/forax/jsonjedi/blob/master/src/Big.java
[2] 
https://github.com/forax/jsonjedi/blob/master/src/jsonjedi/JSONSchemaBuilder.java#L358


From brian.goetz at oracle.com  Thu Feb 14 08:20:42 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 14 Feb 2013 11:20:42 -0500
Subject: A small JSON parsing library
In-Reply-To: <511D0873.4000205@univ-mlv.fr>
References: <511D0873.4000205@univ-mlv.fr>
Message-ID: <511D0EDA.4000106@oracle.com>

> just to see how it goes, I've written a small all-in-one-file library
> that parses a JSON files to an object tree defined by the user. The
> mapping is specified using lambdas, so it's compact but can burn your
> eyes :)
>
>    https://github.com/forax/jsonjedi

Very cool!

> During the development, I've found two main gotchas,
> the first one is the scope rules of the lambda parameter,
> I've already sent message about this rule, it seems that eah time I
> write a page of code, the compiler stop me because I tend to re-use the
> same variable name for the very same object.
> In the example named Big [1], the builder of JSON schema is used
> recursively but I've to use a different names each time (builder,
> builder2, builder3).

I can see how this would be annoying to write.  But as a reader, I 
really prefer it!  If all the variables were called "builder", I would 
be confused.  Much easier if I know exactly which builder you are 
referring to -- especially since the declaration -- "builder -> { ..." 
-- often starts near the right margin.

> We should really remove this stupid rule from the JLS and go back to the
> classical shadowing rules.

Your Big.java provides an excellent example why this rule is great!  :)


From spullara at gmail.com  Thu Feb 14 08:45:09 2013
From: spullara at gmail.com (Sam Pullara)
Date: Thu, 14 Feb 2013 08:45:09 -0800
Subject: A small JSON parsing library
In-Reply-To: <511D0873.4000205@univ-mlv.fr>
References: <511D0873.4000205@univ-mlv.fr>
Message-ID: <CAMUF1S=aOLcXyH5HnJiB6f4Q0ed5x2sgyzX9Va6rru8gT0GKwA@mail.gmail.com>

This rule has also been a pain for me. Since naming is one of the hardest
things in computer science we shouldn't make it any harder.

Sam
 On Feb 14, 2013 8:12 AM, "Remi Forax" <forax at univ-mlv.fr> wrote:

> Hi all,
> just to see how it goes, I've written a small all-in-one-file library that
> parses a JSON files to an object tree defined by the user. The mapping is
> specified using lambdas, so it's compact but can burn your eyes :)
>
>   https://github.com/forax/**jsonjedi <https://github.com/forax/jsonjedi>
>
> The idea is that the JSON parser is triggered by the consumption of the
> corresponding stream,
> so when the code do by example a forEach on a Stream, the parser parse the
> corresponding objects.
> I've used an already existing push parser named json-simple for that.
>
> During the development, I've found two main gotchas,
> the first one is the scope rules of the lambda parameter,
> I've already sent message about this rule, it seems that eah time I write
> a page of code, the compiler stop me because I tend to re-use the same
> variable name for the very same object.
> In the example named Big [1], the builder of JSON schema is used
> recursively but I've to use a different names each time (builder, builder2,
> builder3).
> We should really remove this stupid rule from the JLS and go back to the
> classical shadowing rules.
>
> The second problem is that the implementation uses method handles to but
> due to the poor integration betweeen method handles and lambdas, I've 20
> lines of boilerplate and error prone code [2] which is moreover executed
> too eagerly.
>
> cheers,
> R?mi
>
> [1] https://github.com/forax/**jsonjedi/blob/master/src/Big.**java<https://github.com/forax/jsonjedi/blob/master/src/Big.java>
> [2] https://github.com/forax/**jsonjedi/blob/master/src/**
> jsonjedi/JSONSchemaBuilder.**java#L358<https://github.com/forax/jsonjedi/blob/master/src/jsonjedi/JSONSchemaBuilder.java#L358>
>
>

From maurizio.cimadamore at oracle.com  Thu Feb 14 09:15:37 2013
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Thu, 14 Feb 2013 17:15:37 +0000
Subject: A small JSON parsing library
In-Reply-To: <511D0EDA.4000106@oracle.com>
References: <511D0873.4000205@univ-mlv.fr> <511D0EDA.4000106@oracle.com>
Message-ID: <511D1BB9.6040003@oracle.com>

My eyes are still burning ;-)

Very please with the total absence of type witnesses whatsoever.

Maurizio

On 14/02/13 16:20, Brian Goetz wrote:
>> just to see how it goes, I've written a small all-in-one-file library
>> that parses a JSON files to an object tree defined by the user. The
>> mapping is specified using lambdas, so it's compact but can burn your
>> eyes :)
>>
>>    https://github.com/forax/jsonjedi
>
> Very cool!
>
>> During the development, I've found two main gotchas,
>> the first one is the scope rules of the lambda parameter,
>> I've already sent message about this rule, it seems that eah time I
>> write a page of code, the compiler stop me because I tend to re-use the
>> same variable name for the very same object.
>> In the example named Big [1], the builder of JSON schema is used
>> recursively but I've to use a different names each time (builder,
>> builder2, builder3).
>
> I can see how this would be annoying to write.  But as a reader, I 
> really prefer it!  If all the variables were called "builder", I would 
> be confused.  Much easier if I know exactly which builder you are 
> referring to -- especially since the declaration -- "builder -> { ..." 
> -- often starts near the right margin.
>
>> We should really remove this stupid rule from the JLS and go back to the
>> classical shadowing rules.
>
> Your Big.java provides an excellent example why this rule is great!  :)
>
>


From spullara at gmail.com  Thu Feb 14 09:36:03 2013
From: spullara at gmail.com (Sam Pullara)
Date: Thu, 14 Feb 2013 09:36:03 -0800
Subject: A small JSON parsing library
In-Reply-To: <511D0EDA.4000106@oracle.com>
References: <511D0873.4000205@univ-mlv.fr> <511D0EDA.4000106@oracle.com>
Message-ID: <F4E6395E-4A3D-48B7-9BF1-A6F9C30724F9@gmail.com>


On Feb 14, 2013, at 8:20 AM, Brian Goetz <brian.goetz at oracle.com
>> We should really remove this stupid rule from the JLS and go back to the
>> classical shadowing rules.
> 
> Your Big.java provides an excellent example why this rule is great!  :)

I think the opposite. All those builders are the same object. I'd like the freedom to name them the same.

Sam


From david.lloyd at redhat.com  Thu Feb 14 10:48:57 2013
From: david.lloyd at redhat.com (David M. Lloyd)
Date: Thu, 14 Feb 2013 12:48:57 -0600
Subject: A small JSON parsing library
In-Reply-To: <F4E6395E-4A3D-48B7-9BF1-A6F9C30724F9@gmail.com>
References: <511D0873.4000205@univ-mlv.fr> <511D0EDA.4000106@oracle.com>
	<F4E6395E-4A3D-48B7-9BF1-A6F9C30724F9@gmail.com>
Message-ID: <511D3199.3080800@redhat.com>

On 02/14/2013 11:36 AM, Sam Pullara wrote:
>
> On Feb 14, 2013, at 8:20 AM, Brian Goetz <brian.goetz at oracle.com
>>> We should really remove this stupid rule from the JLS and go back to the
>>> classical shadowing rules.
>>
>> Your Big.java provides an excellent example why this rule is great!  :)
>
> I think the opposite. All those builders are the same object. I'd like the freedom to name them the same.

I agree, it should be up to the user.

-- 
- DML

From forax at univ-mlv.fr  Thu Feb 14 12:10:06 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 14 Feb 2013 21:10:06 +0100
Subject: A small JSON parsing library
In-Reply-To: <F4E6395E-4A3D-48B7-9BF1-A6F9C30724F9@gmail.com>
References: <511D0873.4000205@univ-mlv.fr> <511D0EDA.4000106@oracle.com>
	<F4E6395E-4A3D-48B7-9BF1-A6F9C30724F9@gmail.com>
Message-ID: <511D449E.9080103@univ-mlv.fr>

On 02/14/2013 06:36 PM, Sam Pullara wrote:
> On Feb 14, 2013, at 8:20 AM, Brian Goetz <brian.goetz at oracle.com
>>> We should really remove this stupid rule from the JLS and go back to the
>>> classical shadowing rules.
>> Your Big.java provides an excellent example why this rule is great!  :)
> I think the opposite. All those builders are the same object. I'd like the freedom to name them the same.
>
> Sam
>
>

It's also annoying when you do a filter then map like
   pathes.stream().filter(path -> path != null).map(path -> 
path.getFileName().toString());
again, here, you want to use the same name, because it's the same thing.

R?mi


From brian.goetz at oracle.com  Thu Feb 14 12:56:00 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 14 Feb 2013 15:56:00 -0500
Subject: FlatMapper
In-Reply-To: <511A8CEF.8070800@oracle.com>
References: <511A8CEF.8070800@oracle.com>
Message-ID: <511D4F60.4040407@oracle.com>

OK, so far we have:
  - Joe asks for a better method name -- no suggestions other than 
mapAndFlatten
  - No consensus on whether this goes into JUS or JUF.


On 2/12/2013 1:41 PM, Brian Goetz wrote:
> Here's where things have currently landed with FlatMapper -- this is a
> type in java.util.stream, with nested specializations.
>
> Full bikeshed season is now open.  Are we OK with the name
> explodeInto()?  Is this general enough to join the ranks of Function and
> Supplier as top-level types in java.util.function?
>
> @FunctionalInterface
> public interface FlatMapper<T, U> {
>      void explodeInto(T element, Consumer<U> sink);
>
>      @FunctionalInterface
>      interface ToInt<T> {
>          void explodeInto(T element, IntConsumer sink);
>      }
>
>      @FunctionalInterface
>      interface ToLong<T> {
>          void explodeInto(T element, LongConsumer sink);
>      }
>
>      @FunctionalInterface
>      interface ToDouble<T> {
>          void explodeInto(T element, DoubleConsumer sink);
>      }
>
>      @FunctionalInterface
>      interface OfIntToInt {
>          void explodeInto(int element, IntConsumer sink);
>      }
>
>      @FunctionalInterface
>      interface OfLongToLong {
>          void explodeInto(long element, LongConsumer sink);
>      }
>
>      @FunctionalInterface
>      interface OfDoubleToDouble {
>          void explodeInto(double element, DoubleConsumer sink);
>      }
> }

From joe.bowbeer at gmail.com  Thu Feb 14 13:33:35 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 14 Feb 2013 13:33:35 -0800
Subject: FlatMapper
In-Reply-To: <511D4F60.4040407@oracle.com>
References: <511A8CEF.8070800@oracle.com>
	<511D4F60.4040407@oracle.com>
Message-ID: <CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>

I'm not opposed to explode, but I think it would be better to find a verb
that is related to flattening.  Extrude is better in that regard than
explode.

extrudeInto
mapAndExtrude

On the downside, 'explode' has more hacker cred than 'extrude'.

 --Joe


On Thu, Feb 14, 2013 at 12:56 PM, Brian Goetz <brian.goetz at oracle.com>wrote:

> OK, so far we have:
>  - Joe asks for a better method name -- no suggestions other than
> mapAndFlatten
>  - No consensus on whether this goes into JUS or JUF.
>
>
>
>
> On 2/12/2013 1:41 PM, Brian Goetz wrote:
>
>> Here's where things have currently landed with FlatMapper -- this is a
>> type in java.util.stream, with nested specializations.
>>
>> Full bikeshed season is now open.  Are we OK with the name
>> explodeInto()?  Is this general enough to join the ranks of Function and
>> Supplier as top-level types in java.util.function?
>>
>> @FunctionalInterface
>> public interface FlatMapper<T, U> {
>>      void explodeInto(T element, Consumer<U> sink);
>>
>>      @FunctionalInterface
>>      interface ToInt<T> {
>>          void explodeInto(T element, IntConsumer sink);
>>      }
>>
>>      @FunctionalInterface
>>      interface ToLong<T> {
>>          void explodeInto(T element, LongConsumer sink);
>>      }
>>
>>      @FunctionalInterface
>>      interface ToDouble<T> {
>>          void explodeInto(T element, DoubleConsumer sink);
>>      }
>>
>>      @FunctionalInterface
>>      interface OfIntToInt {
>>          void explodeInto(int element, IntConsumer sink);
>>      }
>>
>>      @FunctionalInterface
>>      interface OfLongToLong {
>>          void explodeInto(long element, LongConsumer sink);
>>      }
>>
>>      @FunctionalInterface
>>      interface OfDoubleToDouble {
>>          void explodeInto(double element, DoubleConsumer sink);
>>      }
>> }
>>
>

From brian.goetz at oracle.com  Fri Feb 15 10:48:24 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 15 Feb 2013 13:48:24 -0500
Subject: FlatMapper
In-Reply-To: <CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
Message-ID: <511E82F8.1060509@oracle.com>

So it seems the choice is:

  - Keep this tied to flatMap and keep it in JUS.  Advantage: makes the 
complicated flatMap(FlatMapper) operation easier to understand.

  - Abstract this into a general "map to multiple values and dump 
results into a Consumer" type, move to JUF, ane rename to something like 
"MultiFunction".  Advantage: more future flexibility; Disadvantage: 
mostly guessing about what we might want in the future.

I lean towards the first.

In which case the remaining decision is: what to name the method.

Maybe:

   mapAndFlattenInto

?

On 2/14/2013 4:33 PM, Joe Bowbeer wrote:
> I'm not opposed to explode, but I think it would be better to find a
> verb that is related to flattening.  Extrude is better in that regard
> than explode.
>
> extrudeInto
> mapAndExtrude
>
> On the downside, 'explode' has more hacker cred than 'extrude'.
>
>   --Joe
>
>
> On Thu, Feb 14, 2013 at 12:56 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     OK, so far we have:
>       - Joe asks for a better method name -- no suggestions other than
>     mapAndFlatten
>       - No consensus on whether this goes into JUS or JUF.
>
>
>
>
>     On 2/12/2013 1:41 PM, Brian Goetz wrote:
>
>         Here's where things have currently landed with FlatMapper --
>         this is a
>         type in java.util.stream, with nested specializations.
>
>         Full bikeshed season is now open.  Are we OK with the name
>         explodeInto()?  Is this general enough to join the ranks of
>         Function and
>         Supplier as top-level types in java.util.function?
>
>         @FunctionalInterface
>         public interface FlatMapper<T, U> {
>               void explodeInto(T element, Consumer<U> sink);
>
>               @FunctionalInterface
>               interface ToInt<T> {
>                   void explodeInto(T element, IntConsumer sink);
>               }
>
>               @FunctionalInterface
>               interface ToLong<T> {
>                   void explodeInto(T element, LongConsumer sink);
>               }
>
>               @FunctionalInterface
>               interface ToDouble<T> {
>                   void explodeInto(T element, DoubleConsumer sink);
>               }
>
>               @FunctionalInterface
>               interface OfIntToInt {
>                   void explodeInto(int element, IntConsumer sink);
>               }
>
>               @FunctionalInterface
>               interface OfLongToLong {
>                   void explodeInto(long element, LongConsumer sink);
>               }
>
>               @FunctionalInterface
>               interface OfDoubleToDouble {
>                   void explodeInto(double element, DoubleConsumer sink);
>               }
>         }
>
>

From brian.goetz at oracle.com  Fri Feb 15 12:04:14 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 15 Feb 2013 15:04:14 -0500
Subject: Characterizing stream operation
Message-ID: <511E94BE.1010901@oracle.com>

We've divided stream operations as follows:

Intermediate operations.  Always lazy.  Always produce another stream.

Stateful operations.  A kind of intermediate operation.  Currently 
always transforms to the same stream type (e.g., Stream<T> to 
Stream<T>), though this could conceivably change (we haven't found any, 
though).  Must provide their own parallel implementation.  Parallel 
pipelines containing stateful operations are implicitly "sliced" into 
segments on stateful operation boundaries, and executed in segments.

Terminal operations.  The only thing that kicks off stream computation. 
  Produces a non-stream result (value or side-effects.)

For each of these, once you perform an operation on a stream 
(intermediate or terminal), the stream is *consumed* and no more 
operations can be performed on that stream.  (Not entirely true, as the 
TCK team will almost certainly point out to us eventually; there are 
some ops that are no-ops and probably will succeed unless we add 
consumed checks.)


These names are fine from the perspective of the implementation; when 
implementing an operation, you will be implementing one of these three 
types, and conveniently there is a base type for each to subclass.

 From the user perspective, though, they may not be as helpful as some 
alternative taxonomies, such as:

  - lazy operation -- what we now call intermediate operation
  - stateful lazy operation -- what we now call stateful
  - consuming operation -- what we now call terminal

These are good in that they keep a key characteristic -- when the 
computation happens -- in full view.  However, they also create less 
clean boundaries.  For example, iterator() is a consuming operation from 
the perspective of the stream, but from the perspective of the user, may 
be thought of as lazy.

Thoughts on how to adjust this naming to be more intuitive to users?


From forax at univ-mlv.fr  Fri Feb 15 15:46:08 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 16 Feb 2013 00:46:08 +0100
Subject: Characterizing stream operation
In-Reply-To: <511E94BE.1010901@oracle.com>
References: <511E94BE.1010901@oracle.com>
Message-ID: <511EC8C0.3010507@univ-mlv.fr>

On 02/15/2013 09:04 PM, Brian Goetz wrote:
> We've divided stream operations as follows:
>
> Intermediate operations.  Always lazy.  Always produce another stream.
>
> Stateful operations.  A kind of intermediate operation.  Currently 
> always transforms to the same stream type (e.g., Stream<T> to 
> Stream<T>), though this could conceivably change (we haven't found 
> any, though).  Must provide their own parallel implementation.  
> Parallel pipelines containing stateful operations are implicitly 
> "sliced" into segments on stateful operation boundaries, and executed 
> in segments.
>
> Terminal operations.  The only thing that kicks off stream 
> computation.  Produces a non-stream result (value or side-effects.)
>
> For each of these, once you perform an operation on a stream 
> (intermediate or terminal), the stream is *consumed* and no more 
> operations can be performed on that stream.  (Not entirely true, as 
> the TCK team will almost certainly point out to us eventually; there 
> are some ops that are no-ops and probably will succeed unless we add 
> consumed checks.)
>
>
> These names are fine from the perspective of the implementation; when 
> implementing an operation, you will be implementing one of these three 
> types, and conveniently there is a base type for each to subclass.
>
> From the user perspective, though, they may not be as helpful as some 
> alternative taxonomies, such as:
>
>  - lazy operation -- what we now call intermediate operation
>  - stateful lazy operation -- what we now call stateful
>  - consuming operation -- what we now call terminal
>
> These are good in that they keep a key characteristic -- when the 
> computation happens -- in full view.  However, they also create less 
> clean boundaries.  For example, iterator() is a consuming operation 
> from the perspective of the stream, but from the perspective of the 
> user, may be thought of as lazy.
>
> Thoughts on how to adjust this naming to be more intuitive to users?
>

lazy and terminal are Ok for me, stateful can be renamed to intermediate 
stateful.

R?mi


From joe.bowbeer at gmail.com  Sat Feb 16 21:31:19 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 16 Feb 2013 21:31:19 -0800
Subject: FlatMapper
In-Reply-To: <511E82F8.1060509@oracle.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
Message-ID: <CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>

> mapAndFlattenInto ?

OK


On Fri, Feb 15, 2013 at 10:48 AM, Brian Goetz <brian.goetz at oracle.com>wrote:

> So it seems the choice is:
>
>  - Keep this tied to flatMap and keep it in JUS.  Advantage: makes the
> complicated flatMap(FlatMapper) operation easier to understand.
>
>  - Abstract this into a general "map to multiple values and dump results
> into a Consumer" type, move to JUF, ane rename to something like
> "MultiFunction".  Advantage: more future flexibility; Disadvantage: mostly
> guessing about what we might want in the future.
>
> I lean towards the first.
>
> In which case the remaining decision is: what to name the method.
>
> Maybe:
>
>   mapAndFlattenInto
>
> ?
>
>
> On 2/14/2013 4:33 PM, Joe Bowbeer wrote:
>
>> I'm not opposed to explode, but I think it would be better to find a
>> verb that is related to flattening.  Extrude is better in that regard
>> than explode.
>>
>> extrudeInto
>> mapAndExtrude
>>
>> On the downside, 'explode' has more hacker cred than 'extrude'.
>>
>>   --Joe
>>
>>
>> On Thu, Feb 14, 2013 at 12:56 PM, Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>     OK, so far we have:
>>       - Joe asks for a better method name -- no suggestions other than
>>     mapAndFlatten
>>       - No consensus on whether this goes into JUS or JUF.
>>
>>
>>
>>
>>     On 2/12/2013 1:41 PM, Brian Goetz wrote:
>>
>>         Here's where things have currently landed with FlatMapper --
>>         this is a
>>         type in java.util.stream, with nested specializations.
>>
>>         Full bikeshed season is now open.  Are we OK with the name
>>         explodeInto()?  Is this general enough to join the ranks of
>>         Function and
>>         Supplier as top-level types in java.util.function?
>>
>>         @FunctionalInterface
>>         public interface FlatMapper<T, U> {
>>               void explodeInto(T element, Consumer<U> sink);
>>
>>               @FunctionalInterface
>>               interface ToInt<T> {
>>                   void explodeInto(T element, IntConsumer sink);
>>               }
>>
>>               @FunctionalInterface
>>               interface ToLong<T> {
>>                   void explodeInto(T element, LongConsumer sink);
>>               }
>>
>>               @FunctionalInterface
>>               interface ToDouble<T> {
>>                   void explodeInto(T element, DoubleConsumer sink);
>>               }
>>
>>               @FunctionalInterface
>>               interface OfIntToInt {
>>                   void explodeInto(int element, IntConsumer sink);
>>               }
>>
>>               @FunctionalInterface
>>               interface OfLongToLong {
>>                   void explodeInto(long element, LongConsumer sink);
>>               }
>>
>>               @FunctionalInterface
>>               interface OfDoubleToDouble {
>>                   void explodeInto(double element, DoubleConsumer sink);
>>               }
>>         }
>>
>>
>>

From forax at univ-mlv.fr  Sun Feb 17 03:17:06 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 17 Feb 2013 12:17:06 +0100
Subject: FlatMapper
In-Reply-To: <CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
Message-ID: <5120BC32.3000309@univ-mlv.fr>

On 02/17/2013 06:31 AM, Joe Bowbeer wrote:
> > mapAndFlattenInto ?
>
> OK

I really like explode just because we will see comments on mailing list 
saying that someone have a problem when they try to explode :)

mapAndFlattenInto is a little to verbose for me, mapAndFlat ?

R?mi

>
>
> On Fri, Feb 15, 2013 at 10:48 AM, Brian Goetz <brian.goetz at oracle.com 
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     So it seems the choice is:
>
>      - Keep this tied to flatMap and keep it in JUS.  Advantage: makes
>     the complicated flatMap(FlatMapper) operation easier to understand.
>
>      - Abstract this into a general "map to multiple values and dump
>     results into a Consumer" type, move to JUF, ane rename to
>     something like "MultiFunction".  Advantage: more future
>     flexibility; Disadvantage: mostly guessing about what we might
>     want in the future.
>
>     I lean towards the first.
>
>     In which case the remaining decision is: what to name the method.
>
>     Maybe:
>
>       mapAndFlattenInto
>
>     ?
>
>
>     On 2/14/2013 4:33 PM, Joe Bowbeer wrote:
>
>         I'm not opposed to explode, but I think it would be better to
>         find a
>         verb that is related to flattening.  Extrude is better in that
>         regard
>         than explode.
>
>         extrudeInto
>         mapAndExtrude
>
>         On the downside, 'explode' has more hacker cred than 'extrude'.
>
>           --Joe
>
>
>         On Thu, Feb 14, 2013 at 12:56 PM, Brian Goetz
>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>
>         <mailto:brian.goetz at oracle.com
>         <mailto:brian.goetz at oracle.com>>> wrote:
>
>             OK, so far we have:
>               - Joe asks for a better method name -- no suggestions
>         other than
>             mapAndFlatten
>               - No consensus on whether this goes into JUS or JUF.
>
>
>
>
>             On 2/12/2013 1:41 PM, Brian Goetz wrote:
>
>                 Here's where things have currently landed with
>         FlatMapper --
>                 this is a
>                 type in java.util.stream, with nested specializations.
>
>                 Full bikeshed season is now open.  Are we OK with the name
>                 explodeInto()?  Is this general enough to join the
>         ranks of
>                 Function and
>                 Supplier as top-level types in java.util.function?
>
>                 @FunctionalInterface
>                 public interface FlatMapper<T, U> {
>                       void explodeInto(T element, Consumer<U> sink);
>
>                       @FunctionalInterface
>                       interface ToInt<T> {
>                           void explodeInto(T element, IntConsumer sink);
>                       }
>
>                       @FunctionalInterface
>                       interface ToLong<T> {
>                           void explodeInto(T element, LongConsumer sink);
>                       }
>
>                       @FunctionalInterface
>                       interface ToDouble<T> {
>                           void explodeInto(T element, DoubleConsumer
>         sink);
>                       }
>
>                       @FunctionalInterface
>                       interface OfIntToInt {
>                           void explodeInto(int element, IntConsumer sink);
>                       }
>
>                       @FunctionalInterface
>                       interface OfLongToLong {
>                           void explodeInto(long element, LongConsumer
>         sink);
>                       }
>
>                       @FunctionalInterface
>                       interface OfDoubleToDouble {
>                           void explodeInto(double element,
>         DoubleConsumer sink);
>                       }
>                 }
>
>
>


From tim at peierls.net  Sun Feb 17 06:36:38 2013
From: tim at peierls.net (Tim Peierls)
Date: Sun, 17 Feb 2013 09:36:38 -0500
Subject: FlatMapper
In-Reply-To: <5120BC32.3000309@univ-mlv.fr>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
Message-ID: <CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>

On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax <forax at univ-mlv.fr> wrote:

> mapAndFlattenInto is a little to verbose for me, mapAndFlat ?
>

No, has to be a verb.

I'd still understand flattenInto, leaving the mapping part to be implied by
the type name.

--tim

From dl at cs.oswego.edu  Sun Feb 17 06:55:37 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Sun, 17 Feb 2013 09:55:37 -0500
Subject: FlatMapper
In-Reply-To: <CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
Message-ID: <5120EF69.2020105@cs.oswego.edu>

On 02/17/13 09:36, Tim Peierls wrote:
> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax <forax at univ-mlv.fr
> <mailto:forax at univ-mlv.fr>> wrote:
>
>     mapAndFlattenInto is a little to verbose for me, mapAndFlat ?
>
>
> No, has to be a verb.
>
> I'd still understand flattenInto, leaving the mapping part to be implied by the
> type name.
>

Deja vu all over again. Yeah, flattenInto seems fine.
A similarly fused map-reduce just called reduce would
also have been fine. More than fine...

-Doug


From brian.goetz at oracle.com  Sun Feb 17 11:07:59 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sun, 17 Feb 2013 14:07:59 -0500
Subject: FlatMapper
In-Reply-To: <CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
Message-ID: <51212A8F.7000109@oracle.com>

flattenInto seems the best so far.

On 2/17/2013 9:36 AM, Tim Peierls wrote:
> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax <forax at univ-mlv.fr
> <mailto:forax at univ-mlv.fr>> wrote:
>
>     mapAndFlattenInto is a little to verbose for me, mapAndFlat ?
>
>
> No, has to be a verb.
>
> I'd still understand flattenInto, leaving the mapping part to be implied
> by the type name.
>
> --tim

From forax at univ-mlv.fr  Sun Feb 17 11:06:12 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sun, 17 Feb 2013 20:06:12 +0100
Subject: FlatMapper
In-Reply-To: <51212A8F.7000109@oracle.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com>
Message-ID: <51212A24.2020101@univ-mlv.fr>

On 02/17/2013 08:07 PM, Brian Goetz wrote:
> flattenInto seems the best so far.

+1

R?mi

>
> On 2/17/2013 9:36 AM, Tim Peierls wrote:
>> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax <forax at univ-mlv.fr
>> <mailto:forax at univ-mlv.fr>> wrote:
>>
>>     mapAndFlattenInto is a little to verbose for me, mapAndFlat ?
>>
>>
>> No, has to be a verb.
>>
>> I'd still understand flattenInto, leaving the mapping part to be implied
>> by the type name.
>>
>> --tim


From joe.bowbeer at gmail.com  Sun Feb 17 12:29:25 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sun, 17 Feb 2013 12:29:25 -0800
Subject: FlatMapper
In-Reply-To: <51212A24.2020101@univ-mlv.fr>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
Message-ID: <CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>

flattenInto gets my vote
On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr> wrote:

> On 02/17/2013 08:07 PM, Brian Goetz wrote:
>
>> flattenInto seems the best so far.
>>
>
> +1
>
> R?mi
>
>
>> On 2/17/2013 9:36 AM, Tim Peierls wrote:
>>
>>> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax <forax at univ-mlv.fr
>>> <mailto:forax at univ-mlv.fr>> wrote:
>>>
>>>     mapAndFlattenInto is a little to verbose for me, mapAndFlat ?
>>>
>>>
>>> No, has to be a verb.
>>>
>>> I'd still understand flattenInto, leaving the mapping part to be implied
>>> by the type name.
>>>
>>> --tim
>>>
>>
>

From brian.goetz at oracle.com  Mon Feb 18 12:20:23 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 18 Feb 2013 15:20:23 -0500
Subject: Reducing reduce
In-Reply-To: <51191F3D.4090203@oracle.com>
References: <51191F3D.4090203@oracle.com>
Message-ID: <51228D07.9060004@oracle.com>

Circling back to this (i.e., "reducing reduce", redux):

There are a lot of considerations here, many mostly accidental (e.g., 
consequences of erasure and the primitive/reference divide).

The three-arg functional reduce form is functionally equivalent to the 
two-arg form, except that there are some constructions that are more 
efficient to handle in the three arg form.  However, the best example we 
came up with, Joe's string compare, suffers because he had to use 
boxing.  So we're currently in a place where the best example to support 
this form has other defects that make the form hard to support.  And, 
any form of functional reduce on a reference would likely result in a 
lot of object creation, so the optimization of eliding some of the 
mapping would have to overcome that.

Further, one can still handle this without boxing using collect() and an 
explicit mutable result holder.  On the other hand, if/when the language 
acquires tuples, it will be a very different story, and this form would 
become infinitely more useful.

So I think the evidence weighs slightly in favor of ditching this form 
for now (though I'd feel better if people didn't have to use either an 
ad-hoc class or a single-element array as the data box when using the 
collect() form.)

Secondarily, ditching the three-arg form from Stream would remove one 
element of support for naming reduce and collect differently; part of 
the motivation for a different name was that the three-arg collect and 
three-arg reduce overloaded very poorly if they had the same name. 
However, I think we should resist the temptation to act on this.  I 
think (a) there is pedagogical value in separating function and mutable 
reduce forms, and (b) if we do this, we slam the door on the more 
flexible version, which will badly bite us in a tupled future.

We might still consider the three-arg version for IntStream.  That's the 
case where Joe's example works.

On 2/11/2013 11:41 AM, Brian Goetz wrote:
> Now that we've added all the shapes of map() to Stream (map to
> ref/int/long/double), and we've separated functional reduce (currently
> called reduce) from mutable reduce (currently called collect), I think
> that leaves room for taking out one of the reduce methods from Stream:
>
>      <U> U reduce(U identity,
>                   BiFunction<U, ? super T, U> accumulator,
>                   BinaryOperator<U> reducer);
>
> This is the one that confuses everyone anyway, and I don't think we need
> it any more.
>
> The argument for having this form instead of discrete map+reduce are:
>   - fused map+reduce reduces boxing
>   - this three-arg form can also fold filtering into the accumulation
>
> However, since we now have primitive-bearing map methods, and we can do
> filtering before and after the map, is this form really carrying its
> weight?  Specifically because people find it counterintuitive, we should
> consider dropping it and guiding people towards map+reduce.
>
> For example, "sum of pages" over a stream of Documents is better written
> as:
>
>    docs.map(Document::getPageCount).sum()
>
> rather than
>
>    docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>
> The big place where we need three-arg reduce is when we're folding into
> a mutable store.  But that's now handled by collect().
>
> Have I missed any use cases that would justify keeping this form?
>

From brian.goetz at oracle.com  Mon Feb 18 12:50:52 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 18 Feb 2013 15:50:52 -0500
Subject: forEachUntil
Message-ID: <5122942C.7000600@oracle.com>

Based on further user feedback, I think the name forEachUntil is too 
confusing; it makes people (including some members of this expert group) 
think that it is supposed to be an encounter-based limiting operation, 
rather than an externally-based cancelling operation.  Until seems to be 
inextricably linked in people's minds to encounter order, with all the 
attendant confusion.  People seem more able to understand cancellation, 
and in particular to understand that cancellation is usually a 
cooperative, best-efforts thing rather than the deterministic 
content-based limiting that people have in mind.

Accordingly, I think we should rename to "forEachWithCancel", which is 
more suggestive (and, secondarily, the ugly name subtly reinforces that 
it serves uncommon use cases.)

From tim at peierls.net  Mon Feb 18 13:29:51 2013
From: tim at peierls.net (Tim Peierls)
Date: Mon, 18 Feb 2013 16:29:51 -0500
Subject: forEachUntil
In-Reply-To: <5122942C.7000600@oracle.com>
References: <5122942C.7000600@oracle.com>
Message-ID: <CA+F8eeQ_HL+Y46b0RPJysdhGxndp2q23CQ4kAf=M4+1xMkZbeQ@mail.gmail.com>

Overloading forEach isn't possible? I don't think extra uglification beyond
including a canCancel argument is needed to reinforce the uncommonness of
the usage.

--tim

On Mon, Feb 18, 2013 at 3:50 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Based on further user feedback, I think the name forEachUntil is too
> confusing; it makes people (including some members of this expert group)
> think that it is supposed to be an encounter-based limiting operation,
> rather than an externally-based cancelling operation.  Until seems to be
> inextricably linked in people's minds to encounter order, with all the
> attendant confusion.  People seem more able to understand cancellation, and
> in particular to understand that cancellation is usually a cooperative,
> best-efforts thing rather than the deterministic content-based limiting
> that people have in mind.
>
> Accordingly, I think we should rename to "forEachWithCancel", which is
> more suggestive (and, secondarily, the ugly name subtly reinforces that it
> serves uncommon use cases.)
>

From brian.goetz at oracle.com  Mon Feb 18 13:32:01 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 18 Feb 2013 16:32:01 -0500
Subject: forEachUntil
In-Reply-To: <CA+F8eeQ_HL+Y46b0RPJysdhGxndp2q23CQ4kAf=M4+1xMkZbeQ@mail.gmail.com>
References: <5122942C.7000600@oracle.com>
	<CA+F8eeQ_HL+Y46b0RPJysdhGxndp2q23CQ4kAf=M4+1xMkZbeQ@mail.gmail.com>
Message-ID: <51229DD1.9000904@oracle.com>

Overloading forEach is certainly possible.  However, I think that it may 
well be subject to the same "this is not the method you are looking for" 
confusion as forEachUntil was (though is probably slightly better in 
this way.)

On 2/18/2013 4:29 PM, Tim Peierls wrote:
> Overloading forEach isn't possible? I don't think extra uglification
> beyond including a canCancel argument is needed to reinforce the
> uncommonness of the usage.
>
> --tim
>
> On Mon, Feb 18, 2013 at 3:50 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     Based on further user feedback, I think the name forEachUntil is too
>     confusing; it makes people (including some members of this expert
>     group) think that it is supposed to be an encounter-based limiting
>     operation, rather than an externally-based cancelling operation.
>       Until seems to be inextricably linked in people's minds to
>     encounter order, with all the attendant confusion.  People seem more
>     able to understand cancellation, and in particular to understand
>     that cancellation is usually a cooperative, best-efforts thing
>     rather than the deterministic content-based limiting that people
>     have in mind.
>
>     Accordingly, I think we should rename to "forEachWithCancel", which
>     is more suggestive (and, secondarily, the ugly name subtly
>     reinforces that it serves uncommon use cases.)
>
>

From joe.bowbeer at gmail.com  Mon Feb 18 15:16:38 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 18 Feb 2013 15:16:38 -0800
Subject: Reducing reduce
In-Reply-To: <51228D07.9060004@oracle.com>
References: <51191F3D.4090203@oracle.com>
	<51228D07.9060004@oracle.com>
Message-ID: <CAHzJPEpWFzMTFPhShyA_LikKZfNzHHtK+C-X7Z-B3=1bP84sJw@mail.gmail.com>

I wouldn't have thought that boxed had anything to do with "our" 3-arg
reduce example.  Boxed is by-product of my decision to use a primitive
generator (intRange).  I could have picked a different generator and then I
wouldn't have needed boxed(), yet the 3-arg reduce form would be unaffected.

There are lots of applications for prefix-sums.  Guy Blelloch listed 13 in
1993, and string-compare just happened to be at the top of the list, where
I started:

http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf

I have a couple of related questions, which I think may be raised by others:

1. Why don't we have a 3-arg mapreduce like Guy Steele discusses in his
Parallel-Not talks?

http://vimeo.com/6624203

(or a map-scan-zip?)

2. Why don't we have a parallel fold (map+combine) like Rich Hickey added
to Clojure?

http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html

--Joe


On Mon, Feb 18, 2013 at 12:20 PM, Brian Goetz <brian.goetz at oracle.com>wrote:

> Circling back to this (i.e., "reducing reduce", redux):
>
> There are a lot of considerations here, many mostly accidental (e.g.,
> consequences of erasure and the primitive/reference divide).
>
> The three-arg functional reduce form is functionally equivalent to the
> two-arg form, except that there are some constructions that are more
> efficient to handle in the three arg form.  However, the best example we
> came up with, Joe's string compare, suffers because he had to use boxing.
>  So we're currently in a place where the best example to support this form
> has other defects that make the form hard to support.  And, any form of
> functional reduce on a reference would likely result in a lot of object
> creation, so the optimization of eliding some of the mapping would have to
> overcome that.
>
> Further, one can still handle this without boxing using collect() and an
> explicit mutable result holder.  On the other hand, if/when the language
> acquires tuples, it will be a very different story, and this form would
> become infinitely more useful.
>
> So I think the evidence weighs slightly in favor of ditching this form for
> now (though I'd feel better if people didn't have to use either an ad-hoc
> class or a single-element array as the data box when using the collect()
> form.)
>
> Secondarily, ditching the three-arg form from Stream would remove one
> element of support for naming reduce and collect differently; part of the
> motivation for a different name was that the three-arg collect and
> three-arg reduce overloaded very poorly if they had the same name. However,
> I think we should resist the temptation to act on this.  I think (a) there
> is pedagogical value in separating function and mutable reduce forms, and
> (b) if we do this, we slam the door on the more flexible version, which
> will badly bite us in a tupled future.
>
> We might still consider the three-arg version for IntStream.  That's the
> case where Joe's example works.
>
>
> On 2/11/2013 11:41 AM, Brian Goetz wrote:
>
>> Now that we've added all the shapes of map() to Stream (map to
>> ref/int/long/double), and we've separated functional reduce (currently
>> called reduce) from mutable reduce (currently called collect), I think
>> that leaves room for taking out one of the reduce methods from Stream:
>>
>>      <U> U reduce(U identity,
>>                   BiFunction<U, ? super T, U> accumulator,
>>                   BinaryOperator<U> reducer);
>>
>> This is the one that confuses everyone anyway, and I don't think we need
>> it any more.
>>
>> The argument for having this form instead of discrete map+reduce are:
>>   - fused map+reduce reduces boxing
>>   - this three-arg form can also fold filtering into the accumulation
>>
>> However, since we now have primitive-bearing map methods, and we can do
>> filtering before and after the map, is this form really carrying its
>> weight?  Specifically because people find it counterintuitive, we should
>> consider dropping it and guiding people towards map+reduce.
>>
>> For example, "sum of pages" over a stream of Documents is better written
>> as:
>>
>>    docs.map(Document::**getPageCount).sum()
>>
>> rather than
>>
>>    docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum)
>>
>> The big place where we need three-arg reduce is when we're folding into
>> a mutable store.  But that's now handled by collect().
>>
>> Have I missed any use cases that would justify keeping this form?
>>
>>

From brian.goetz at oracle.com  Mon Feb 18 15:20:20 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 18 Feb 2013 18:20:20 -0500
Subject: Reducing reduce
In-Reply-To: <CAHzJPEpWFzMTFPhShyA_LikKZfNzHHtK+C-X7Z-B3=1bP84sJw@mail.gmail.com>
References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com>
	<CAHzJPEpWFzMTFPhShyA_LikKZfNzHHtK+C-X7Z-B3=1bP84sJw@mail.gmail.com>
Message-ID: <5122B734.2030903@oracle.com>

> 2. Why don't we have a parallel fold (map+combine) like Rich Hickey
> added to Clojure?
>
> http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html

I'm confused -- the 3-arg reduce was directly inspired by Rich's 
Reducers work?

From joe.bowbeer at gmail.com  Mon Feb 18 15:34:16 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 18 Feb 2013 15:34:16 -0800
Subject: Reducing reduce
In-Reply-To: <5122B734.2030903@oracle.com>
References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com>
	<CAHzJPEpWFzMTFPhShyA_LikKZfNzHHtK+C-X7Z-B3=1bP84sJw@mail.gmail.com>
	<5122B734.2030903@oracle.com>
Message-ID: <CAHzJPEoxVg5uPS+BJuajqVkZizTefA-STYWdqMQiwZNHMHZQfQ@mail.gmail.com>

Maybe I'm confused.

Why are you now trying to eliminate it?


On Mon, Feb 18, 2013 at 3:20 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> 2. Why don't we have a parallel fold (map+combine) like Rich Hickey
>> added to Clojure?
>>
>> http://clojure.com/blog/2012/**05/08/reducers-a-library-and-**
>> model-for-collection-**processing.html<http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html>
>>
>
> I'm confused -- the 3-arg reduce was directly inspired by Rich's Reducers
> work?
>

From brian.goetz at oracle.com  Mon Feb 18 15:46:04 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 18 Feb 2013 18:46:04 -0500
Subject: Reducing reduce
In-Reply-To: <CAHzJPEoxVg5uPS+BJuajqVkZizTefA-STYWdqMQiwZNHMHZQfQ@mail.gmail.com>
References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com>
	<CAHzJPEpWFzMTFPhShyA_LikKZfNzHHtK+C-X7Z-B3=1bP84sJw@mail.gmail.com>
	<5122B734.2030903@oracle.com>
	<CAHzJPEoxVg5uPS+BJuajqVkZizTefA-STYWdqMQiwZNHMHZQfQ@mail.gmail.com>
Message-ID: <5122BD3C.1050308@oracle.com>

See earlier posting on "reducing reduce."  Two reasons:

  - People find it confusing -- they want to know why they have to 
specify two functions that "do the same thing."  (Because they are 
thinking sequentially.)  And that confusion then infects all the reduce 
forms.

  - The cases where it has an advantage over either map+reduce or 
collect seem to be somewhat limited.  And, due to lots of accidental 
complexity reasons, often involve a fair amount of object creation 
overhead, which starts to eat into the potential performance advantage.

So through the combination of "few people will use it, but it confuses 
everyone", it seems a reasonable candidate for pruning.


On 2/18/2013 6:34 PM, Joe Bowbeer wrote:
> Maybe I'm confused.
>
> Why are you now trying to eliminate it?
>
>
>
> On Mon, Feb 18, 2013 at 3:20 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>         2. Why don't we have a parallel fold (map+combine) like Rich Hickey
>         added to Clojure?
>
>         http://clojure.com/blog/2012/__05/08/reducers-a-library-and-__model-for-collection-__processing.html
>         <http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html>
>
>
>     I'm confused -- the 3-arg reduce was directly inspired by Rich's
>     Reducers work?
>
>

From joe.bowbeer at gmail.com  Mon Feb 18 15:54:14 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 18 Feb 2013 15:54:14 -0800
Subject: Reducing reduce
In-Reply-To: <5122BD3C.1050308@oracle.com>
References: <51191F3D.4090203@oracle.com> <51228D07.9060004@oracle.com>
	<CAHzJPEpWFzMTFPhShyA_LikKZfNzHHtK+C-X7Z-B3=1bP84sJw@mail.gmail.com>
	<5122B734.2030903@oracle.com>
	<CAHzJPEoxVg5uPS+BJuajqVkZizTefA-STYWdqMQiwZNHMHZQfQ@mail.gmail.com>
	<5122BD3C.1050308@oracle.com>
Message-ID: <CAHzJPErN+9kPuHubUmN9OTgt8Un_bEnb9otCoCdZunBDm2T9VQ@mail.gmail.com>

I'm not seeing a good reason not to keep the 3-arg form.

I like it for pedagogical reasons.  The string-compare is not really a
practical example, after all, so I'm not bothered that it uses boxed().  I
think there are potential practical uses for the 3-arg form, and its
inclusion gives us something to point at when asked by viewers of Guy's
talks or users of Clojure.

This is not a classic case of YAGNI.  The "are going to" case has already
been made.

The argument that, well, it doesn't work so nicely in Java, will need some
more explaining before I buy it.

Joe


On Mon, Feb 18, 2013 at 3:46 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> See earlier posting on "reducing reduce."  Two reasons:
>
>  - People find it confusing -- they want to know why they have to specify
> two functions that "do the same thing."  (Because they are thinking
> sequentially.)  And that confusion then infects all the reduce forms.
>
>  - The cases where it has an advantage over either map+reduce or collect
> seem to be somewhat limited.  And, due to lots of accidental complexity
> reasons, often involve a fair amount of object creation overhead, which
> starts to eat into the potential performance advantage.
>
> So through the combination of "few people will use it, but it confuses
> everyone", it seems a reasonable candidate for pruning.
>
>
>
> On 2/18/2013 6:34 PM, Joe Bowbeer wrote:
>
>> Maybe I'm confused.
>>
>> Why are you now trying to eliminate it?
>>
>>
>>
>> On Mon, Feb 18, 2013 at 3:20 PM, Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>         2. Why don't we have a parallel fold (map+combine) like Rich
>> Hickey
>>         added to Clojure?
>>
>>         http://clojure.com/blog/2012/_**_05/08/reducers-a-library-and-**
>> __model-for-collection-__**processing.html<http://clojure.com/blog/2012/__05/08/reducers-a-library-and-__model-for-collection-__processing.html>
>>
>>         <http://clojure.com/blog/2012/**05/08/reducers-a-library-and-**
>> model-for-collection-**processing.html<http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html>
>> >
>>
>>
>>     I'm confused -- the 3-arg reduce was directly inspired by Rich's
>>     Reducers work?
>>
>>
>>

From Vladimir.Zakharov at gs.com  Mon Feb 18 20:06:12 2013
From: Vladimir.Zakharov at gs.com (Zakharov, Vladimir)
Date: Mon, 18 Feb 2013 23:06:12 -0500
Subject: forEachUntil
In-Reply-To: <5122942C.7000600@oracle.com>
References: <5122942C.7000600@oracle.com>
Message-ID: <CC083D55BC84324BAD626880FD2669831590E30D3B@GSCMAMP02EX.firmwide.corp.gs.com>

Sounds reasonable. "forEachWithCancel", perhaps "forEachUntilCancelled" (either works as it implies an external actor doing the cancellation).

The Goldman Sachs Group, Inc. All rights reserved.
See http://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest and other terms and conditions relating to this e-mail and your reliance on information contained in it.? This message may contain confidential or privileged information.? If you are not the intended recipient, please advise us immediately and delete this message.? See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication.? If you cannot access these links, please notify us by reply message and we will send the contents to you.?

-----Original Message-----
From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda-libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Brian Goetz
Sent: Monday, February 18, 2013 3:51 PM
To: lambda-libs-spec-experts at openjdk.java.net
Subject: forEachUntil

Based on further user feedback, I think the name forEachUntil is too 
confusing; it makes people (including some members of this expert group) 
think that it is supposed to be an encounter-based limiting operation, 
rather than an externally-based cancelling operation.  Until seems to be 
inextricably linked in people's minds to encounter order, with all the 
attendant confusion.  People seem more able to understand cancellation, 
and in particular to understand that cancellation is usually a 
cooperative, best-efforts thing rather than the deterministic 
content-based limiting that people have in mind.

Accordingly, I think we should rename to "forEachWithCancel", which is 
more suggestive (and, secondarily, the ugly name subtly reinforces that 
it serves uncommon use cases.)

From brian.goetz at oracle.com  Thu Feb 21 07:44:24 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 21 Feb 2013 10:44:24 -0500
Subject: Iterable.stream()
Message-ID: <512640D8.2020500@oracle.com>

Currently we define stream() and parallelStream() on Collection, with 
defaults like:

     default Stream<E> stream() {
         return Streams.stream(
            () -> Streams.spliterator(iterator(), size(),
                                      Spliterator.SIZED),
            Spliterator.SIZED);
     }

In other words, if a Collection does not override stream(), it gets the 
stream based on the iterator.

It has been suggested that we could move stream/parallelStream() up to 
Iterable.  They could use an almost identical default, except that they 
don't know the SIZED flag.  (The default in Collection would stay, so 
existing inheritors of the Collection default wouldn't see any 
difference.  (This is why default methods are virtual.))

Several people have asked why not move these to Iterable, since some 
APIs return "Iterable" as a least-common-denominator aggregate type, and 
this would allow those APIs to participate in the stream fun.  There are 
also a handful of other types that implement Iterable, such as Path 
(Iterable<Path>) and DirectoryStream (where we'd added an entries() 
method, but that would just then become stream()).

The sole downside is that it creates (yet another) external dependency 
from java.lang.Iterable -- now to java.util.stream.

Thoughts?


From kevinb at google.com  Thu Feb 21 08:06:52 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 21 Feb 2013 08:06:52 -0800
Subject: Iterable.stream()
In-Reply-To: <512640D8.2020500@oracle.com>
References: <512640D8.2020500@oracle.com>
Message-ID: <CAGKkBktAcjFcJKAwrWTqCXHUPMwo8OjfXLwQPnA4w+Te=KL2xw@mail.gmail.com>

1. Yes please.
2. And this time I won't hijack the thread.


On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Currently we define stream() and parallelStream() on Collection, with
> defaults like:
>
>     default Stream<E> stream() {
>         return Streams.stream(
>            () -> Streams.spliterator(iterator()**, size(),
>                                      Spliterator.SIZED),
>            Spliterator.SIZED);
>     }
>
> In other words, if a Collection does not override stream(), it gets the
> stream based on the iterator.
>
> It has been suggested that we could move stream/parallelStream() up to
> Iterable.  They could use an almost identical default, except that they
> don't know the SIZED flag.  (The default in Collection would stay, so
> existing inheritors of the Collection default wouldn't see any difference.
>  (This is why default methods are virtual.))
>
> Several people have asked why not move these to Iterable, since some APIs
> return "Iterable" as a least-common-denominator aggregate type, and this
> would allow those APIs to participate in the stream fun.  There are also a
> handful of other types that implement Iterable, such as Path
> (Iterable<Path>) and DirectoryStream (where we'd added an entries() method,
> but that would just then become stream()).
>
> The sole downside is that it creates (yet another) external dependency
> from java.lang.Iterable -- now to java.util.stream.
>
> Thoughts?
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From joe.bowbeer at gmail.com  Thu Feb 21 08:14:14 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 21 Feb 2013 08:14:14 -0800
Subject: Iterable.stream()
In-Reply-To: <CAGKkBktAcjFcJKAwrWTqCXHUPMwo8OjfXLwQPnA4w+Te=KL2xw@mail.gmail.com>
References: <512640D8.2020500@oracle.com>
	<CAGKkBktAcjFcJKAwrWTqCXHUPMwo8OjfXLwQPnA4w+Te=KL2xw@mail.gmail.com>
Message-ID: <CAHzJPEpzDRd5n05a5F310at5KgiL1e4CaaUjDpD+1t4-PyCG5g@mail.gmail.com>

When this question was raised 2 weeks ago, you asked:

""
Can we make our best attempt to specify Iterable.stream() better than
Iterable.iterator() was?

I haven't worked out how to say this yet, but the idea is:

- If at all possible to ensure that each call to stream() returns an actual
working and independent stream, you really really should do that.
- If that's just not possible, the second call to stream() really really
should throw ISE.
""

Is this something we should address?  There was no discussion about this
last time.
On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" <kevinb at google.com> wrote:

> 1. Yes please.
> 2. And this time I won't hijack the thread.
>
>
> On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> Currently we define stream() and parallelStream() on Collection, with
>> defaults like:
>>
>>     default Stream<E> stream() {
>>         return Streams.stream(
>>            () -> Streams.spliterator(iterator()**, size(),
>>                                      Spliterator.SIZED),
>>            Spliterator.SIZED);
>>     }
>>
>> In other words, if a Collection does not override stream(), it gets the
>> stream based on the iterator.
>>
>> It has been suggested that we could move stream/parallelStream() up to
>> Iterable.  They could use an almost identical default, except that they
>> don't know the SIZED flag.  (The default in Collection would stay, so
>> existing inheritors of the Collection default wouldn't see any difference.
>>  (This is why default methods are virtual.))
>>
>> Several people have asked why not move these to Iterable, since some APIs
>> return "Iterable" as a least-common-denominator aggregate type, and this
>> would allow those APIs to participate in the stream fun.  There are also a
>> handful of other types that implement Iterable, such as Path
>> (Iterable<Path>) and DirectoryStream (where we'd added an entries() method,
>> but that would just then become stream()).
>>
>> The sole downside is that it creates (yet another) external dependency
>> from java.lang.Iterable -- now to java.util.stream.
>>
>> Thoughts?
>>
>>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>

From brian.goetz at oracle.com  Thu Feb 21 08:27:01 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 21 Feb 2013 11:27:01 -0500
Subject: Iterable.stream()
In-Reply-To: <CAHzJPEpzDRd5n05a5F310at5KgiL1e4CaaUjDpD+1t4-PyCG5g@mail.gmail.com>
References: <512640D8.2020500@oracle.com>
	<CAGKkBktAcjFcJKAwrWTqCXHUPMwo8OjfXLwQPnA4w+Te=KL2xw@mail.gmail.com>
	<CAHzJPEpzDRd5n05a5F310at5KgiL1e4CaaUjDpD+1t4-PyCG5g@mail.gmail.com>
Message-ID: <51264AD5.7010009@oracle.com>

On the other hand, a big argument in favor of this is the simplicity of 
building our spliterator() on iterator().  Having stream() have 
different behavior than iterator() would be weird.

The iterator() method might do one of:
  A: give you a fresh iterator every time
  B: give you the same iterator every time
  C: throw

With the implementation as proposed, the behavior of stream() in these 
cases would be:
  A: give you a fresh stream every time
  B: give you a fresh stream, but which end up sharing the common Iterator
  C: throw

B leads to unpredictable results, but no more nasty than any other case 
where B happens.

(Joe's idea is a good guideline for writing iterator() methods anyway, 
maybe we should put that into the doc as a suggestion, asking classes 
that don't behave this way to be polite and document their deviant 
behavior.)

On 2/21/2013 11:14 AM, Joe Bowbeer wrote:
> When this question was raised 2 weeks ago, you asked:
>
> ""
> Can we make our best attempt to specify Iterable.stream() better than
> Iterable.iterator() was?
>
> I haven't worked out how to say this yet, but the idea is:
>
> - If at all possible to ensure that each call to stream() returns an
> actual working and independent stream, you really really should do that.
> - If that's just not possible, the second call to stream() really really
> should throw ISE.
> ""
>
> Is this something we should address?  There was no discussion about this
> last time.
>
> On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" <kevinb at google.com
> <mailto:kevinb at google.com>> wrote:
>
>     1. Yes please.
>     2. And this time I won't hijack the thread.
>
>
>     On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz <brian.goetz at oracle.com
>     <mailto:brian.goetz at oracle.com>> wrote:
>
>         Currently we define stream() and parallelStream() on Collection,
>         with defaults like:
>
>              default Stream<E> stream() {
>                  return Streams.stream(
>                     () -> Streams.spliterator(iterator()__, size(),
>                                               Spliterator.SIZED),
>                     Spliterator.SIZED);
>              }
>
>         In other words, if a Collection does not override stream(), it
>         gets the stream based on the iterator.
>
>         It has been suggested that we could move stream/parallelStream()
>         up to Iterable.  They could use an almost identical default,
>         except that they don't know the SIZED flag.  (The default in
>         Collection would stay, so existing inheritors of the Collection
>         default wouldn't see any difference.  (This is why default
>         methods are virtual.))
>
>         Several people have asked why not move these to Iterable, since
>         some APIs return "Iterable" as a least-common-denominator
>         aggregate type, and this would allow those APIs to participate
>         in the stream fun.  There are also a handful of other types that
>         implement Iterable, such as Path (Iterable<Path>) and
>         DirectoryStream (where we'd added an entries() method, but that
>         would just then become stream()).
>
>         The sole downside is that it creates (yet another) external
>         dependency from java.lang.Iterable -- now to java.util.stream.
>
>         Thoughts?
>
>
>
>
>     --
>     Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>     <mailto:kevinb at google.com>
>

From kevinb at google.com  Thu Feb 21 08:33:09 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 21 Feb 2013 08:33:09 -0800
Subject: FlatMapper
In-Reply-To: <CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
Message-ID: <CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>

Tardy, but:  the Googlers I ran this by all felt just fine with "mapInto".
 Sure, you can map *multiple, *but that fact just didn't seem overly
necessary to force into the name.

On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> flattenInto gets my vote
> On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr> wrote:
>
>> On 02/17/2013 08:07 PM, Brian Goetz wrote:
>>
>>> flattenInto seems the best so far.
>>>
>>
>> +1
>>
>> R?mi
>>
>>
>>> On 2/17/2013 9:36 AM, Tim Peierls wrote:
>>>
>>>> On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax <forax at univ-mlv.fr
>>>> <mailto:forax at univ-mlv.fr>> wrote:
>>>>
>>>>     mapAndFlattenInto is a little to verbose for me, mapAndFlat ?
>>>>
>>>>
>>>> No, has to be a verb.
>>>>
>>>> I'd still understand flattenInto, leaving the mapping part to be implied
>>>> by the type name.
>>>>
>>>> --tim
>>>>
>>>
>>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Thu Feb 21 08:35:58 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 21 Feb 2013 11:35:58 -0500
Subject: FlatMapper
In-Reply-To: <CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
Message-ID: <51264CEE.5090003@oracle.com>

Is mapInto better than flattenInto?  Still trivial to change at this point.

On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:
> Tardy, but:  the Googlers I ran this by all felt just fine with
> "mapInto".  Sure, you can map /multiple, /but that fact just didn't seem
> overly necessary to force into the name.
>
> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
> <mailto:joe.bowbeer at gmail.com>> wrote:
>
>     flattenInto gets my vote
>
>     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
>     <mailto:forax at univ-mlv.fr>> wrote:
>
>         On 02/17/2013 08:07 PM, Brian Goetz wrote:
>
>             flattenInto seems the best so far.
>
>
>         +1
>
>         R?mi
>
>
>             On 2/17/2013 9:36 AM, Tim Peierls wrote:
>
>                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
>                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
>                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>>
>                 wrote:
>
>                      mapAndFlattenInto is a little to verbose for me,
>                 mapAndFlat ?
>
>
>                 No, has to be a verb.
>
>                 I'd still understand flattenInto, leaving the mapping
>                 part to be implied
>                 by the type name.
>
>                 --tim
>
>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>

From joe.bowbeer at gmail.com  Thu Feb 21 08:38:31 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 21 Feb 2013 08:38:31 -0800
Subject: Iterable.stream()
In-Reply-To: <51264AD5.7010009@oracle.com>
References: <512640D8.2020500@oracle.com>
	<CAGKkBktAcjFcJKAwrWTqCXHUPMwo8OjfXLwQPnA4w+Te=KL2xw@mail.gmail.com>
	<CAHzJPEpzDRd5n05a5F310at5KgiL1e4CaaUjDpD+1t4-PyCG5g@mail.gmail.com>
	<51264AD5.7010009@oracle.com>
Message-ID: <CAHzJPEprrKzCwCgX2kqNU+Rj-AQW4+GhK+NMKA5-yqH3C=5_Qw@mail.gmail.com>

I was reposting Kevin's earlier question and idea. Delimited with "".
On Feb 21, 2013 8:27 AM, "Brian Goetz" <brian.goetz at oracle.com> wrote:

> On the other hand, a big argument in favor of this is the simplicity of
> building our spliterator() on iterator().  Having stream() have different
> behavior than iterator() would be weird.
>
> The iterator() method might do one of:
>  A: give you a fresh iterator every time
>  B: give you the same iterator every time
>  C: throw
>
> With the implementation as proposed, the behavior of stream() in these
> cases would be:
>  A: give you a fresh stream every time
>  B: give you a fresh stream, but which end up sharing the common Iterator
>  C: throw
>
> B leads to unpredictable results, but no more nasty than any other case
> where B happens.
>
> (Joe's idea is a good guideline for writing iterator() methods anyway,
> maybe we should put that into the doc as a suggestion, asking classes that
> don't behave this way to be polite and document their deviant behavior.)
>
> On 2/21/2013 11:14 AM, Joe Bowbeer wrote:
>
>> When this question was raised 2 weeks ago, you asked:
>>
>> ""
>> Can we make our best attempt to specify Iterable.stream() better than
>> Iterable.iterator() was?
>>
>> I haven't worked out how to say this yet, but the idea is:
>>
>> - If at all possible to ensure that each call to stream() returns an
>> actual working and independent stream, you really really should do that.
>> - If that's just not possible, the second call to stream() really really
>> should throw ISE.
>> ""
>>
>> Is this something we should address?  There was no discussion about this
>> last time.
>>
>> On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" <kevinb at google.com
>> <mailto:kevinb at google.com>> wrote:
>>
>>     1. Yes please.
>>     2. And this time I won't hijack the thread.
>>
>>
>>     On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz <brian.goetz at oracle.com
>>     <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>         Currently we define stream() and parallelStream() on Collection,
>>         with defaults like:
>>
>>              default Stream<E> stream() {
>>                  return Streams.stream(
>>                     () -> Streams.spliterator(iterator()**__, size(),
>>                                               Spliterator.SIZED),
>>                     Spliterator.SIZED);
>>              }
>>
>>         In other words, if a Collection does not override stream(), it
>>         gets the stream based on the iterator.
>>
>>         It has been suggested that we could move stream/parallelStream()
>>         up to Iterable.  They could use an almost identical default,
>>         except that they don't know the SIZED flag.  (The default in
>>         Collection would stay, so existing inheritors of the Collection
>>         default wouldn't see any difference.  (This is why default
>>         methods are virtual.))
>>
>>         Several people have asked why not move these to Iterable, since
>>         some APIs return "Iterable" as a least-common-denominator
>>         aggregate type, and this would allow those APIs to participate
>>         in the stream fun.  There are also a handful of other types that
>>         implement Iterable, such as Path (Iterable<Path>) and
>>         DirectoryStream (where we'd added an entries() method, but that
>>         would just then become stream()).
>>
>>         The sole downside is that it creates (yet another) external
>>         dependency from java.lang.Iterable -- now to java.util.stream.
>>
>>         Thoughts?
>>
>>
>>
>>
>>     --
>>     Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>>     <mailto:kevinb at google.com>
>>
>>

From kevinb at google.com  Thu Feb 21 08:42:56 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 21 Feb 2013 08:42:56 -0800
Subject: FlatMapper
In-Reply-To: <51264CEE.5090003@oracle.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
Message-ID: <CAGKkBku6hPtUHawJFoQ+xT1m7=0VeuqJf6AenUKsFErnh_mgQg@mail.gmail.com>

I believe the mapping aspect is an order of magnitude more relevant than
the flattening aspect.  The way we've designed the API, nothing is exactly
being *flattened*, anyway.  It's just that multiple results may be emitted.


On Thu, Feb 21, 2013 at 8:35 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Is mapInto better than flattenInto?  Still trivial to change at this point.
>
>
> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:
>
>> Tardy, but:  the Googlers I ran this by all felt just fine with
>> "mapInto".  Sure, you can map /multiple, /but that fact just didn't seem
>>
>> overly necessary to force into the name.
>>
>> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>>
>>     flattenInto gets my vote
>>
>>     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
>>     <mailto:forax at univ-mlv.fr>> wrote:
>>
>>         On 02/17/2013 08:07 PM, Brian Goetz wrote:
>>
>>             flattenInto seems the best so far.
>>
>>
>>         +1
>>
>>         R?mi
>>
>>
>>             On 2/17/2013 9:36 AM, Tim Peierls wrote:
>>
>>                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
>>                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
>>                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>>
>>
>>                 wrote:
>>
>>                      mapAndFlattenInto is a little to verbose for me,
>>                 mapAndFlat ?
>>
>>
>>                 No, has to be a verb.
>>
>>                 I'd still understand flattenInto, leaving the mapping
>>                 part to be implied
>>                 by the type name.
>>
>>                 --tim
>>
>>
>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>
>>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From kevinb at google.com  Thu Feb 21 08:41:37 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 21 Feb 2013 08:41:37 -0800
Subject: Iterable.stream()
In-Reply-To: <CAHzJPEpzDRd5n05a5F310at5KgiL1e4CaaUjDpD+1t4-PyCG5g@mail.gmail.com>
References: <512640D8.2020500@oracle.com>
	<CAGKkBktAcjFcJKAwrWTqCXHUPMwo8OjfXLwQPnA4w+Te=KL2xw@mail.gmail.com>
	<CAHzJPEpzDRd5n05a5F310at5KgiL1e4CaaUjDpD+1t4-PyCG5g@mail.gmail.com>
Message-ID: <CAGKkBkt5gSm+XEbDhBTxKGmR5cmHfb0ww_=ywnxxo5wEAMoomg@mail.gmail.com>

On Thu, Feb 21, 2013 at 8:14 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> Is this something we should address?  There was no discussion about this
> last time.
>
I still think it is. It's true that anyone who inherits the *default *stream()
will get one that's only as good as their (possibly lousy) iterator always
is, but that's the best we can do.


> On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" <kevinb at google.com> wrote:
>
>> 1. Yes please.
>> 2. And this time I won't hijack the thread.
>>
>>
>> On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz <brian.goetz at oracle.com>wrote:
>>
>>> Currently we define stream() and parallelStream() on Collection, with
>>> defaults like:
>>>
>>>     default Stream<E> stream() {
>>>         return Streams.stream(
>>>            () -> Streams.spliterator(iterator()**, size(),
>>>                                      Spliterator.SIZED),
>>>            Spliterator.SIZED);
>>>     }
>>>
>>> In other words, if a Collection does not override stream(), it gets the
>>> stream based on the iterator.
>>>
>>> It has been suggested that we could move stream/parallelStream() up to
>>> Iterable.  They could use an almost identical default, except that they
>>> don't know the SIZED flag.  (The default in Collection would stay, so
>>> existing inheritors of the Collection default wouldn't see any difference.
>>>  (This is why default methods are virtual.))
>>>
>>> Several people have asked why not move these to Iterable, since some
>>> APIs return "Iterable" as a least-common-denominator aggregate type, and
>>> this would allow those APIs to participate in the stream fun.  There are
>>> also a handful of other types that implement Iterable, such as Path
>>> (Iterable<Path>) and DirectoryStream (where we'd added an entries() method,
>>> but that would just then become stream()).
>>>
>>> The sole downside is that it creates (yet another) external dependency
>>> from java.lang.Iterable -- now to java.util.stream.
>>>
>>> Thoughts?
>>>
>>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From Donald.Raab at gs.com  Thu Feb 21 08:42:22 2013
From: Donald.Raab at gs.com (Raab, Donald)
Date: Thu, 21 Feb 2013 11:42:22 -0500
Subject: FlatMapper
In-Reply-To: <51264CEE.5090003@oracle.com>
References: <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
Message-ID: <6712820CB52CFB4D842561213A77C05404C3A898A8@GSCMAMP09EX.firmwide.corp.gs.com>

Is there anything wrong with flatMapInto?  Apologies if this was already covered and dismissed.  

> -----Original Message-----
> From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda-
> libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Brian Goetz
> Sent: Thursday, February 21, 2013 11:36 AM
> To: Kevin Bourrillion
> Cc: lambda-libs-spec-experts at openjdk.java.net
> Subject: Re: FlatMapper
> 
> Is mapInto better than flattenInto?  Still trivial to change at this
> point.
> 
> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:
> > Tardy, but:  the Googlers I ran this by all felt just fine with
> > "mapInto".  Sure, you can map /multiple, /but that fact just didn't
> > seem overly necessary to force into the name.
> >
> > On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
> > <mailto:joe.bowbeer at gmail.com>> wrote:
> >
> >     flattenInto gets my vote
> >
> >     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
> >     <mailto:forax at univ-mlv.fr>> wrote:
> >
> >         On 02/17/2013 08:07 PM, Brian Goetz wrote:
> >
> >             flattenInto seems the best so far.
> >
> >
> >         +1
> >
> >         R?mi
> >
> >
> >             On 2/17/2013 9:36 AM, Tim Peierls wrote:
> >
> >                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
> >                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
> >                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-
> mlv.fr>>>
> >                 wrote:
> >
> >                      mapAndFlattenInto is a little to verbose for me,
> >                 mapAndFlat ?
> >
> >
> >                 No, has to be a verb.
> >
> >                 I'd still understand flattenInto, leaving the mapping
> >                 part to be implied
> >                 by the type name.
> >
> >                 --tim
> >
> >
> >
> >
> >
> > --
> > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> > <mailto:kevinb at google.com>

From joe.bowbeer at gmail.com  Thu Feb 21 08:42:27 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 21 Feb 2013 08:42:27 -0800
Subject: FlatMapper
In-Reply-To: <51264CEE.5090003@oracle.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
Message-ID: <CAHzJPEqshx8QkE37WeLEDiBphcOf7e_mZDTfWbtuh2+pPfZxvQ@mail.gmail.com>

I prefer flattenInto.
On Feb 21, 2013 8:36 AM, "Brian Goetz" <brian.goetz at oracle.com> wrote:

> Is mapInto better than flattenInto?  Still trivial to change at this point.
>
> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:
>
>> Tardy, but:  the Googlers I ran this by all felt just fine with
>> "mapInto".  Sure, you can map /multiple, /but that fact just didn't seem
>> overly necessary to force into the name.
>>
>> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>>
>>     flattenInto gets my vote
>>
>>     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
>>     <mailto:forax at univ-mlv.fr>> wrote:
>>
>>         On 02/17/2013 08:07 PM, Brian Goetz wrote:
>>
>>             flattenInto seems the best so far.
>>
>>
>>         +1
>>
>>         R?mi
>>
>>
>>             On 2/17/2013 9:36 AM, Tim Peierls wrote:
>>
>>                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
>>                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
>>                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>>
>>                 wrote:
>>
>>                      mapAndFlattenInto is a little to verbose for me,
>>                 mapAndFlat ?
>>
>>
>>                 No, has to be a verb.
>>
>>                 I'd still understand flattenInto, leaving the mapping
>>                 part to be implied
>>                 by the type name.
>>
>>                 --tim
>>
>>
>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>
>>
>

From kevinb at google.com  Thu Feb 21 08:43:36 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Thu, 21 Feb 2013 08:43:36 -0800
Subject: FlatMapper
In-Reply-To: <CAGKkBku6hPtUHawJFoQ+xT1m7=0VeuqJf6AenUKsFErnh_mgQg@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
	<CAGKkBku6hPtUHawJFoQ+xT1m7=0VeuqJf6AenUKsFErnh_mgQg@mail.gmail.com>
Message-ID: <CAGKkBksyCRzXHnSvtvPy6Dfz8uL-NHDNeRzEsGJZFz5S22+U3A@mail.gmail.com>

emit()?


On Thu, Feb 21, 2013 at 8:42 AM, Kevin Bourrillion <kevinb at google.com>wrote:

> I believe the mapping aspect is an order of magnitude more relevant than
> the flattening aspect.  The way we've designed the API, nothing is exactly
> being *flattened*, anyway.  It's just that multiple results may be
> emitted.
>
>
> On Thu, Feb 21, 2013 at 8:35 AM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> Is mapInto better than flattenInto?  Still trivial to change at this
>> point.
>>
>>
>> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:
>>
>>> Tardy, but:  the Googlers I ran this by all felt just fine with
>>> "mapInto".  Sure, you can map /multiple, /but that fact just didn't seem
>>>
>>> overly necessary to force into the name.
>>>
>>> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
>>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>>>
>>>     flattenInto gets my vote
>>>
>>>     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
>>>     <mailto:forax at univ-mlv.fr>> wrote:
>>>
>>>         On 02/17/2013 08:07 PM, Brian Goetz wrote:
>>>
>>>             flattenInto seems the best so far.
>>>
>>>
>>>         +1
>>>
>>>         R?mi
>>>
>>>
>>>             On 2/17/2013 9:36 AM, Tim Peierls wrote:
>>>
>>>                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
>>>                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
>>>                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>>
>>>
>>>                 wrote:
>>>
>>>                      mapAndFlattenInto is a little to verbose for me,
>>>                 mapAndFlat ?
>>>
>>>
>>>                 No, has to be a verb.
>>>
>>>                 I'd still understand flattenInto, leaving the mapping
>>>                 part to be implied
>>>                 by the type name.
>>>
>>>                 --tim
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>>> <mailto:kevinb at google.com>
>>>
>>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From tim at peierls.net  Thu Feb 21 08:46:34 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 21 Feb 2013 11:46:34 -0500
Subject: FlatMapper
In-Reply-To: <51264CEE.5090003@oracle.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
Message-ID: <CA+F8eeR7=eGrCVGe_viJXJLRqeCqrDiOj_uabiqKtg_TBYGJYQ@mail.gmail.com>

Yes, mapInto is better than flattenInto.

On Thu, Feb 21, 2013 at 11:35 AM, Brian Goetz <brian.goetz at oracle.com>wrote:

> Is mapInto better than flattenInto?  Still trivial to change at this point.
>
>
> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:
>
>> Tardy, but:  the Googlers I ran this by all felt just fine with
>> "mapInto".  Sure, you can map /multiple, /but that fact just didn't seem
>>
>> overly necessary to force into the name.
>>
>> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>>
>>     flattenInto gets my vote
>>
>>     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
>>     <mailto:forax at univ-mlv.fr>> wrote:
>>
>>         On 02/17/2013 08:07 PM, Brian Goetz wrote:
>>
>>             flattenInto seems the best so far.
>>
>>
>>         +1
>>
>>         R?mi
>>
>>
>>             On 2/17/2013 9:36 AM, Tim Peierls wrote:
>>
>>                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
>>                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
>>                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>>
>>
>>                 wrote:
>>
>>                      mapAndFlattenInto is a little to verbose for me,
>>                 mapAndFlat ?
>>
>>
>>                 No, has to be a verb.
>>
>>                 I'd still understand flattenInto, leaving the mapping
>>                 part to be implied
>>                 by the type name.
>>
>>                 --tim
>>
>>
>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>> <mailto:kevinb at google.com>
>>
>

From joe.bowbeer at gmail.com  Thu Feb 21 08:47:35 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 21 Feb 2013 08:47:35 -0800
Subject: FlatMapper
In-Reply-To: <CAGKkBksyCRzXHnSvtvPy6Dfz8uL-NHDNeRzEsGJZFz5S22+U3A@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
	<CAGKkBku6hPtUHawJFoQ+xT1m7=0VeuqJf6AenUKsFErnh_mgQg@mail.gmail.com>
	<CAGKkBksyCRzXHnSvtvPy6Dfz8uL-NHDNeRzEsGJZFz5S22+U3A@mail.gmail.com>
Message-ID: <CAHzJPEon0qK1xREJQ6-S3AfmF-gtyDfsCMU2pe16EozEXzabEw@mail.gmail.com>

Let's go back to mapAndFlattenInto and try this exercise again!

Last time we ended up at flattenInto, but maybe we took a wrong turn near
the start?
 On Feb 21, 2013 8:43 AM, "Kevin Bourrillion" <kevinb at google.com> wrote:

> emit()?
>
>
> On Thu, Feb 21, 2013 at 8:42 AM, Kevin Bourrillion <kevinb at google.com>wrote:
>
>> I believe the mapping aspect is an order of magnitude more relevant than
>> the flattening aspect.  The way we've designed the API, nothing is exactly
>> being *flattened*, anyway.  It's just that multiple results may be
>> emitted.
>>
>>
>> On Thu, Feb 21, 2013 at 8:35 AM, Brian Goetz <brian.goetz at oracle.com>wrote:
>>
>>> Is mapInto better than flattenInto?  Still trivial to change at this
>>> point.
>>>
>>>
>>> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:
>>>
>>>> Tardy, but:  the Googlers I ran this by all felt just fine with
>>>> "mapInto".  Sure, you can map /multiple, /but that fact just didn't seem
>>>>
>>>> overly necessary to force into the name.
>>>>
>>>> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
>>>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>>>>
>>>>     flattenInto gets my vote
>>>>
>>>>     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
>>>>     <mailto:forax at univ-mlv.fr>> wrote:
>>>>
>>>>         On 02/17/2013 08:07 PM, Brian Goetz wrote:
>>>>
>>>>             flattenInto seems the best so far.
>>>>
>>>>
>>>>         +1
>>>>
>>>>         R?mi
>>>>
>>>>
>>>>             On 2/17/2013 9:36 AM, Tim Peierls wrote:
>>>>
>>>>                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
>>>>                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
>>>>                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>>
>>>>
>>>>                 wrote:
>>>>
>>>>                      mapAndFlattenInto is a little to verbose for me,
>>>>                 mapAndFlat ?
>>>>
>>>>
>>>>                 No, has to be a verb.
>>>>
>>>>                 I'd still understand flattenInto, leaving the mapping
>>>>                 part to be implied
>>>>                 by the type name.
>>>>
>>>>                 --tim
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
>>>> <mailto:kevinb at google.com>
>>>>
>>>
>>
>>
>> --
>> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>

From spullara at gmail.com  Thu Feb 21 08:56:35 2013
From: spullara at gmail.com (Sam Pullara)
Date: Thu, 21 Feb 2013 08:56:35 -0800
Subject: FlatMapper
In-Reply-To: <CA+F8eeR7=eGrCVGe_viJXJLRqeCqrDiOj_uabiqKtg_TBYGJYQ@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
	<CA+F8eeR7=eGrCVGe_viJXJLRqeCqrDiOj_uabiqKtg_TBYGJYQ@mail.gmail.com>
Message-ID: <CF063748-FAA3-4708-90D6-E3E66A6BBA71@gmail.com>

I like mapInto as well.

Sam

On Feb 21, 2013, at 8:46 AM, Tim Peierls <tim at peierls.net> wrote:

> Yes, mapInto is better than flattenInto.
> 
> On Thu, Feb 21, 2013 at 11:35 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> Is mapInto better than flattenInto?  Still trivial to change at this point.
> 
> 
> On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:
> Tardy, but:  the Googlers I ran this by all felt just fine with
> "mapInto".  Sure, you can map /multiple, /but that fact just didn't seem
> 
> overly necessary to force into the name.
> 
> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
> <mailto:joe.bowbeer at gmail.com>> wrote:
> 
>     flattenInto gets my vote
> 
>     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
>     <mailto:forax at univ-mlv.fr>> wrote:
> 
>         On 02/17/2013 08:07 PM, Brian Goetz wrote:
> 
>             flattenInto seems the best so far.
> 
> 
>         +1
> 
>         R?mi
> 
> 
>             On 2/17/2013 9:36 AM, Tim Peierls wrote:
> 
>                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
>                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
>                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>>
> 
>                 wrote:
> 
>                      mapAndFlattenInto is a little to verbose for me,
>                 mapAndFlat ?
> 
> 
>                 No, has to be a verb.
> 
>                 I'd still understand flattenInto, leaving the mapping
>                 part to be implied
>                 by the type name.
> 
>                 --tim
> 
> 
> 
> 
> 
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>
> 


From joe.bowbeer at gmail.com  Thu Feb 21 09:18:01 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 21 Feb 2013 09:18:01 -0800
Subject: FlatMapper
In-Reply-To: <51264CEE.5090003@oracle.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
Message-ID: <CAHzJPEqOrBHbY0Ti=JYPgEHwVkHC_ODwSQqj5T3jeUBH8t88dg@mail.gmail.com>

Last time we were looking for a descriptive name not necessarily a great
name.

flattenInto does a good job of referencing its interface, while mapInto is
more ambiguous in that respect.

Is mapInto more easily confused with other names such as 'collect'?

 Is mapInto better than flattenInto?  Still trivial to change at this point.

On 2/21/2013 11:33 AM, Kevin Bourrillion wrote:

> Tardy, but:  the Googlers I ran this by all felt just fine with
> "mapInto".  Sure, you can map /multiple, /but that fact just didn't seem
> overly necessary to force into the name.
>
> On Sun, Feb 17, 2013 at 12:29 PM, Joe Bowbeer <joe.bowbeer at gmail.com
> <mailto:joe.bowbeer at gmail.com>**> wrote:
>
>     flattenInto gets my vote
>
>     On Feb 17, 2013 11:09 AM, "Remi Forax" <forax at univ-mlv.fr
>     <mailto:forax at univ-mlv.fr>> wrote:
>
>         On 02/17/2013 08:07 PM, Brian Goetz wrote:
>
>             flattenInto seems the best so far.
>
>
>         +1
>
>         R?mi
>
>
>             On 2/17/2013 9:36 AM, Tim Peierls wrote:
>
>                 On Sun, Feb 17, 2013 at 6:17 AM, Remi Forax
>                 <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>
>                 <mailto:forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>>
>                 wrote:
>
>                      mapAndFlattenInto is a little to verbose for me,
>                 mapAndFlat ?
>
>
>                 No, has to be a verb.
>
>                 I'd still understand flattenInto, leaving the mapping
>                 part to be implied
>                 by the type name.
>
>                 --tim
>
>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>
>

From tim at peierls.net  Thu Feb 21 09:39:27 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 21 Feb 2013 12:39:27 -0500
Subject: FlatMapper
In-Reply-To: <CAHzJPEqOrBHbY0Ti=JYPgEHwVkHC_ODwSQqj5T3jeUBH8t88dg@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
	<CAHzJPEqOrBHbY0Ti=JYPgEHwVkHC_ODwSQqj5T3jeUBH8t88dg@mail.gmail.com>
Message-ID: <CA+F8eeQo+_QgNWOy423Hr95Tz86-_ho-68mTzoaf64TybzOQHQ@mail.gmail.com>

On Thu, Feb 21, 2013 at 12:18 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> Is mapInto more easily confused with other names such as 'collect'?
>
No, I don't think so.

As Kevin pointed out, there's not enough new going on here to deserve a
fancy new (scary) name. We're taking a stream of T values and *map*ping
each one to zero or more U instances and putting them *into* a U consumer.
In the general case, it's not really flattening (or exploding), even though
what I would describe in those terms is a special case of this.

FlatMapper is a *synecdoche*, a more specific term standing in for a more
general concept, and if it makes Scala devotees happy, then I guess it does
no harm. But (and I know it must seem like I'm reversing myself, since I
suggested "flattenInto") I don't see a need to repeat the favor in the
method name.

--tim

From dl at cs.oswego.edu  Thu Feb 21 09:44:43 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 21 Feb 2013 12:44:43 -0500
Subject: FlatMapper
In-Reply-To: <CA+F8eeQo+_QgNWOy423Hr95Tz86-_ho-68mTzoaf64TybzOQHQ@mail.gmail.com>
References: <511A8CEF.8070800@oracle.com> <511D4F60.4040407@oracle.com>
	<CAHzJPEprSgVUq5Z1H3q8L_KfHBAtPMHrZ6Tm0Yz5GOmzKWzhOQ@mail.gmail.com>
	<511E82F8.1060509@oracle.com>
	<CAHzJPEp1bMqQkk77Y=tPfXUOz-wpw+y06JkGGtwaQB4RVggCVg@mail.gmail.com>
	<5120BC32.3000309@univ-mlv.fr>
	<CA+F8eeTQfT6JqGOC3DF9guTpo2oFywYBdGmO66_LTwxeTt3wRQ@mail.gmail.com>
	<51212A8F.7000109@oracle.com> <51212A24.2020101@univ-mlv.fr>
	<CAHzJPEr0RQ8X3wG-SDJBOfj+82PQF8dfXqhN1Fvu4k5_r=TGSg@mail.gmail.com>
	<CAGKkBksMg=hY_y95K8B6FD4L11kj1PpqJ4BOEjE6Lmf+PT+UoA@mail.gmail.com>
	<51264CEE.5090003@oracle.com>
	<CAHzJPEqOrBHbY0Ti=JYPgEHwVkHC_ODwSQqj5T3jeUBH8t88dg@mail.gmail.com>
	<CA+F8eeQo+_QgNWOy423Hr95Tz86-_ho-68mTzoaf64TybzOQHQ@mail.gmail.com>
Message-ID: <51265D0B.6010501@cs.oswego.edu>

On 02/21/13 12:39, Tim Peierls wrote:

> FlatMapper is a /synecdoche/, a more specific term standing in for a more
> general concept,

It's always a great day when you can use "synecdoche"!
(Not only because it fits,  but the imagery of plays within
movies within ...)

-Doug


From brian.goetz at oracle.com  Thu Feb 21 11:17:58 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 21 Feb 2013 14:17:58 -0500
Subject: Code review request
Message-ID: <512672E6.1050708@oracle.com>

At
   http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/

I've posted a webrev for about half the classes in java.util.stream. 
None of these are public classes, so there are no public API issues 
here, but plenty of internal API issues, naming issues (ooh, a 
bikeshed), and code quality issues.


From forax at univ-mlv.fr  Thu Feb 21 11:33:37 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 21 Feb 2013 20:33:37 +0100
Subject: Iterable.stream()
In-Reply-To: <CAGKkBkt5gSm+XEbDhBTxKGmR5cmHfb0ww_=ywnxxo5wEAMoomg@mail.gmail.com>
References: <512640D8.2020500@oracle.com>
	<CAGKkBktAcjFcJKAwrWTqCXHUPMwo8OjfXLwQPnA4w+Te=KL2xw@mail.gmail.com>
	<CAHzJPEpzDRd5n05a5F310at5KgiL1e4CaaUjDpD+1t4-PyCG5g@mail.gmail.com>
	<CAGKkBkt5gSm+XEbDhBTxKGmR5cmHfb0ww_=ywnxxo5wEAMoomg@mail.gmail.com>
Message-ID: <51267691.7000206@univ-mlv.fr>

On 02/21/2013 05:41 PM, Kevin Bourrillion wrote:
> On Thu, Feb 21, 2013 at 8:14 AM, Joe Bowbeer <joe.bowbeer at gmail.com 
> <mailto:joe.bowbeer at gmail.com>> wrote:
>
>     Is this something we should address?  There was no discussion
>     about this last time.
>
> I still think it is. It's true that anyone who inherits the /default 
> /stream() will get one that's only as good as their (possibly lousy) 
> iterator always is, but that's the best we can do.

We provide a way to get a Spliterator from an Iterator and a Stream from 
a Spliterator,
so we already provide a way to get a Stream from an Iterable but with no 
way to get a better stream if the Iterable is a Collection.
so Iterable should have a method stream(), default methods are virtual 
exactly for that case.

R?mi

>
>     On Feb 21, 2013 8:07 AM, "Kevin Bourrillion" <kevinb at google.com
>     <mailto:kevinb at google.com>> wrote:
>
>         1. Yes please.
>         2. And this time I won't hijack the thread.
>
>
>         On Thu, Feb 21, 2013 at 7:44 AM, Brian Goetz
>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>> wrote:
>
>             Currently we define stream() and parallelStream() on
>             Collection, with defaults like:
>
>                 default Stream<E> stream() {
>                     return Streams.stream(
>                        () -> Streams.spliterator(iterator(), size(),
>              Spliterator.SIZED),
>                        Spliterator.SIZED);
>                 }
>
>             In other words, if a Collection does not override
>             stream(), it gets the stream based on the iterator.
>
>             It has been suggested that we could move
>             stream/parallelStream() up to Iterable.  They could use an
>             almost identical default, except that they don't know the
>             SIZED flag.  (The default in Collection would stay, so
>             existing inheritors of the Collection default wouldn't see
>             any difference.  (This is why default methods are virtual.))
>
>             Several people have asked why not move these to Iterable,
>             since some APIs return "Iterable" as a
>             least-common-denominator aggregate type, and this would
>             allow those APIs to participate in the stream fun.  There
>             are also a handful of other types that implement Iterable,
>             such as Path (Iterable<Path>) and DirectoryStream (where
>             we'd added an entries() method, but that would just then
>             become stream()).
>
>             The sole downside is that it creates (yet another)
>             external dependency from java.lang.Iterable -- now to
>             java.util.stream.
>
>             Thoughts?
>
>
>
>
>         -- 
>         Kevin Bourrillion | Java Librarian | Google,
>         Inc. |kevinb at google.com <mailto:kevinb at google.com>
>
>
>
>
> -- 
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com 
> <mailto:kevinb at google.com>


From brian.goetz at oracle.com  Thu Feb 21 15:01:30 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 21 Feb 2013 18:01:30 -0500
Subject: Collectors inventory
Message-ID: <5126A74A.3040509@oracle.com>

As I promised a long time ago, here's an overview of what's in 
Collectors currently.

There are 12 basic forms:
  - toCollection(ctor)
  - toList()
  - toSet()
  - toStringBuilder()
  - toStringJoiner(delimiter)
  - to{Long,Double}Statistics

  - groupingBy(classifier, mapFactory, downstream collector)
  - groupingReduce(classifier, mapFactory, mapper, reducer)
  - mapping(mappingFn, downstream collector)
  - joiningWith(mappingFunction, mergeFunction, mapFactory)
  - partitioningBy(predicate, downstream collector)
  - partitioningReduce(predicate, mapper, reducer)

The toXxx forms should be obvious.

Mapping has four versions, analogous to Stream.map:
  - mapping(T -> U, Collector<U, R>)
  - mapping(T -> int, Collector.OfInt<R>)
  - mapping(T -> long, Collector.OfLong<R>)
  - mapping(T -> double, Collector.OfDouble<R>)

GroupingBy has four forms:
  - groupingBy(T->K) -- standard groupBy, values of resulting Map are 
Collection<T>
  - Same, but with explicit constructors for map and for rows (so you 
can produce, say, a TreeMap<K, TreeSet<T>> and not just a 
Map<K,Collection<T>>)
  - groupingBy(T->K, Collector<T,D>) -- multi-level groupBy, where 
downstream is another Collector
  - Same, but with explicit ctor for map

GroupingReduce has four forms:
  - groupingReduce(T->K, BinaryOperator<T>) // simple reduce
  - groupingReduce(T->K, Function<T,U>, BinaryOperator<U>) // map-reduce
  - above two with explicit map ctors

JoiningWith has four forms:
  - joiningWith(T->U)
  - same, but with explicit Map ctor
  - same, but with merge function for handling duplicates
  - same, with both explicit map ctor and merge function

PartitioningBy has three forms:
  - partitioningBy(Predicate)
  - Same, but with explicit constructor for Collection (so you can get a 
Map<Boolean, TreeSet<T>>)
  - partitioningBy(Predicate, Collector) // multi-level

PartitioningReduce has two forms:
  - predicate + reducer
  - predicate + mapper + reducer

Impl note: in any category, all but one are one-liners that delegate to 
the general form.

Plus, all the Map-bearing ones have a concurrent and non-concurrent 
version.


From kevinb at google.com  Fri Feb 22 08:06:17 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 22 Feb 2013 08:06:17 -0800
Subject: A few very minor library issues
Message-ID: <CAGKkBkvQCO6JOKzf+QXrf6EWverDOkvm8Ydoy03-F5rbDi+XJw@mail.gmail.com>

Just a few little things.

1. I feel the Stream methods findFirst() and findAny() can really be named
just first() and any().  The "find" is just odd and doesn't do enough.
Failing that, I'd go for firstElement() / anyElement().

2. I like Stream.substream(), but Stream.sub*S*tream() is undeniably
consistent with the collections API (subSet, etc.; sure, String.substring()
doesn't follow that, but it's "farther away"). I'm actually on the fence
here, because I think "substream" is strictly the *correct* way to
camel-case the word "substream"...

3. Are we concerned that the name Map.computeIfAbsent() obscures what the
mutative effect on the map is?

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Fri Feb 22 08:25:41 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 22 Feb 2013 11:25:41 -0500
Subject: A few very minor library issues
In-Reply-To: <CAGKkBkvQCO6JOKzf+QXrf6EWverDOkvm8Ydoy03-F5rbDi+XJw@mail.gmail.com>
References: <CAGKkBkvQCO6JOKzf+QXrf6EWverDOkvm8Ydoy03-F5rbDi+XJw@mail.gmail.com>
Message-ID: <51279C05.1090007@oracle.com>

> 1. I feel the Stream methods findFirst() and findAny() can really be
> named just first() and any().  The "find" is just odd and doesn't do
> enough. Failing that, I'd go for firstElement() / anyElement().

Agree find is a little weird.  I am fine with first() but a little 
squeamish about any(), just because people who have not yet been through 
the parallelism  meat grinder already find "findAny" weird ("why is it 
different from findFirst?")

Also OK with firstElement() and anyElement().

> 2. I like Stream.substream(), but Stream.sub*S*tream() is undeniably
> consistent with the collections API (subSet, etc.; sure,
> String.substring() doesn't follow that, but it's "farther away"). I'm
> actually on the fence here, because I think "substream" is strictly the
> /correct/ way to camel-case the word "substream"...

No strong opinion here.  What do people want?

From joe.bowbeer at gmail.com  Fri Feb 22 08:56:28 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Fri, 22 Feb 2013 08:56:28 -0800
Subject: A few very minor library issues
In-Reply-To: <CAGKkBkvQCO6JOKzf+QXrf6EWverDOkvm8Ydoy03-F5rbDi+XJw@mail.gmail.com>
References: <CAGKkBkvQCO6JOKzf+QXrf6EWverDOkvm8Ydoy03-F5rbDi+XJw@mail.gmail.com>
Message-ID: <CAHzJPEr2yVKUECADuQROJxbWnsGhaM0_FL7z1OTs2F0nEFAsAA@mail.gmail.com>

I'm fine with the find* methods as they are. It wasn't a problem finding
them and using them in the examples I wrote. The common prefix is a help
for grouping these common methods, and these both return an Option thing,
so the common prefix is also a helpful reminder there.

Just so you know, after we have discussed names several times over several
months and I have already coded the choices into examples, I tend to feel
pretty good about the names and am reluctant to want to change them:-)

I like substream, too.

I'm OK with computeIfAbsent. After years of discussion, it is what it is.
On Feb 22, 2013 8:06 AM, "Kevin Bourrillion" <kevinb at google.com> wrote:

> Just a few little things.
>
> 1. I feel the Stream methods findFirst() and findAny() can really be named
> just first() and any().  The "find" is just odd and doesn't do enough.
> Failing that, I'd go for firstElement() / anyElement().
>
> 2. I like Stream.substream(), but Stream.sub*S*tream() is undeniably
> consistent with the collections API (subSet, etc.; sure, String.substring()
> doesn't follow that, but it's "farther away"). I'm actually on the fence
> here, because I think "substream" is strictly the *correct* way to
> camel-case the word "substream"...
>
> 3. Are we concerned that the name Map.computeIfAbsent() obscures what the
> mutative effect on the map is?
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
>

From kevinb at google.com  Fri Feb 22 08:59:10 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Fri, 22 Feb 2013 08:59:10 -0800
Subject: A few very minor library issues
In-Reply-To: <51279C05.1090007@oracle.com>
References: <CAGKkBkvQCO6JOKzf+QXrf6EWverDOkvm8Ydoy03-F5rbDi+XJw@mail.gmail.com>
	<51279C05.1090007@oracle.com>
Message-ID: <CAGKkBkt7SBLh=mZ-W4FxOe+J-cJ5VQ6aFfRsYypsRpikDJrTAg@mail.gmail.com>

On Fri, Feb 22, 2013 at 8:25 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

 1. I feel the Stream methods findFirst() and findAny() can really be
>> named just first() and any().  The "find" is just odd and doesn't do
>> enough. Failing that, I'd go for firstElement() / anyElement().
>>
>
> Agree find is a little weird.  I am fine with first() but a little
> squeamish about any(), just because people who have not yet been through
> the parallelism  meat grinder already find "findAny" weird ("why is it
> different from findFirst?")
>

That seems like a concern that's roughly the same whether they have a
common prefix or suffix or not.  Though I do see the minor point about a
common prefix grouping them together so that you at least have to ponder
the difference up front...


> Also OK with firstElement() and anyElement().
>
>  2. I like Stream.substream(), but Stream.sub*S*tream() is undeniably
>>
>> consistent with the collections API (subSet, etc.; sure,
>> String.substring() doesn't follow that, but it's "farther away"). I'm
>> actually on the fence here, because I think "substream" is strictly the
>> /correct/ way to camel-case the word "substream"...
>>
>
> No strong opinion here.  What do people want?
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com

From brian.goetz at oracle.com  Fri Feb 22 15:32:14 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 22 Feb 2013 18:32:14 -0500
Subject: Initial spec review for Stream
Message-ID: <5127FFFE.5010407@oracle.com>

I've put up some very rough proto-spec for Stream and the stream 
package-info at: 
http://cr.openjdk.java.net/~briangoetz/JDK-8008682/doc/.  (I've included 
the whole package but am only requesting comments on these two files for 
now, as the rest are incomplete.)  There's definitely lots of stuff 
missing, including:

  - Describe the difference between sequential and parallel streams
  - More general information about reduce, better definitions for 
associativity, more description of how reduce employs parallelism, more 
examples
  - Role of stream flags in various operations, specifically ordering
  - Non-interference and constraints on lambda characteristics (e.g., 
side-effect-freedom)
  - collectUnordered

But its a start.  Comments please!


From forax at univ-mlv.fr  Sat Feb 23 01:40:37 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 23 Feb 2013 10:40:37 +0100
Subject: Collectors inventory
In-Reply-To: <5126A74A.3040509@oracle.com>
References: <5126A74A.3040509@oracle.com>
Message-ID: <51288E95.4010903@univ-mlv.fr>

On 02/22/2013 12:01 AM, Brian Goetz wrote:
> As I promised a long time ago, here's an overview of what's in 
> Collectors currently.

I think there are too many methods in Collectors, we should restrain 
ourselves to 2 forms (3 max).

>
> There are 12 basic forms:
>  - toCollection(ctor)
>  - toList()
>  - toSet()
>  - toStringBuilder()
>  - toStringJoiner(delimiter)
>  - to{Long,Double}Statistics
>
>  - groupingBy(classifier, mapFactory, downstream collector)
>  - groupingReduce(classifier, mapFactory, mapper, reducer)
>  - mapping(mappingFn, downstream collector)
>  - joiningWith(mappingFunction, mergeFunction, mapFactory)
>  - partitioningBy(predicate, downstream collector)
>  - partitioningReduce(predicate, mapper, reducer)
>
> The toXxx forms should be obvious.
>
> Mapping has four versions, analogous to Stream.map:
>  - mapping(T -> U, Collector<U, R>)
>  - mapping(T -> int, Collector.OfInt<R>)
>  - mapping(T -> long, Collector.OfLong<R>)
>  - mapping(T -> double, Collector.OfDouble<R>)
>
> GroupingBy has four forms:
>  - groupingBy(T->K) -- standard groupBy, values of resulting Map are 
> Collection<T>
>  - Same, but with explicit constructors for map and for rows (so you 
> can produce, say, a TreeMap<K, TreeSet<T>> and not just a 
> Map<K,Collection<T>>)
>  - groupingBy(T->K, Collector<T,D>) -- multi-level groupBy, where 
> downstream is another Collector
>  - Same, but with explicit ctor for map

You can remove the third one give, you have the one with an explicit 
constructor.

>
> GroupingReduce has four forms:
>  - groupingReduce(T->K, BinaryOperator<T>) // simple reduce
>  - groupingReduce(T->K, Function<T,U>, BinaryOperator<U>) // map-reduce
>  - above two with explicit map ctors

keep only the ones with explicit constructors.

>
> JoiningWith has four forms:
>  - joiningWith(T->U)
>  - same, but with explicit Map ctor
>  - same, but with merge function for handling duplicates
>  - same, with both explicit map ctor and merge function

remove the third one.

>
> PartitioningBy has three forms:
>  - partitioningBy(Predicate)
>  - Same, but with explicit constructor for Collection (so you can get 
> a Map<Boolean, TreeSet<T>>)
>  - partitioningBy(Predicate, Collector) // multi-level
>
> PartitioningReduce has two forms:
>  - predicate + reducer
>  - predicate + mapper + reducer
>
> Impl note: in any category, all but one are one-liners that delegate 
> to the general form.

R?mi


From forax at univ-mlv.fr  Sat Feb 23 02:51:41 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 23 Feb 2013 11:51:41 +0100
Subject: Code review request
In-Reply-To: <512672E6.1050708@oracle.com>
References: <512672E6.1050708@oracle.com>
Message-ID: <51289F3D.1010609@univ-mlv.fr>

On 02/21/2013 08:17 PM, Brian Goetz wrote:
> At
>   http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
>
> I've posted a webrev for about half the classes in java.util.stream. 
> None of these are public classes, so there are no public API issues 
> here, but plenty of internal API issues, naming issues (ooh, a 
> bikeshed), and code quality issues.
>

Hi Brian,

All protected fields should not be protected but package visible.
Classes are package private so there is no need to use a modifier which 
offer a wider visibility.
The same is true for constructors.

For default method, some of them are marked public, some of them are not,
what the coding convention said ?

Code convention again, there is a lot of if/else with no curly braces, 
or only curly braces
on the if part but not on the else part.
Also, when a if block ends with a return, there is no need to use 'else',

if (result != null) {
    foundResult(result);
    return result;
}
else
    return null;

can be simply written:

if (result != null) {
    foundResult(result);
    return result;
}
return null;


All inner class should not have private constructors, like by example 
FindOp.FindTask, because
the compiler will have to generate a special accessor for them when they 
are called from
the outer class.

In AbstractShortCircuitTask:
It's not clear that cancel and sharedResult can be accessed directly 
given that they both have methods that acts as getter and setter.
If they can be accessed directly, I think it's better to declare them 
private and to use getters.

Depending on the ops, some of them do nullcheks of arguments at creating 
time (ForEachOp) , some of them don't (FindOp).
In ForEachUntilOp, the 'consumer' is checked but 'until' is not.

in ForEachOp, most of the keyword protected are not needed, 
ForEachUntilOp which inherits from ForEachOp is in the same package.

In ForEachUntilOp, the constructor should be private (like for all the 
other ops).

In MatchOp, line 110, I think the compiler bug is fixed now ?
The enum MatchKind should not be public and all constructor of all inner 
classes should not be private.

In OpsUtils, some static methods are public and some are not,

in Tripwire:
   enabled should be in uppercase (ENABLED).
   method trip() should be:
      public static void trip(Class<?> trippingClass, String msg)

cheers,
R?mi


From tim at peierls.net  Sat Feb 23 09:06:05 2013
From: tim at peierls.net (Tim Peierls)
Date: Sat, 23 Feb 2013 12:06:05 -0500
Subject: Code review request
In-Reply-To: <512672E6.1050708@oracle.com>
References: <512672E6.1050708@oracle.com>
Message-ID: <CA+F8eeQ5hrs06pFM1gqZNuvziVK9WWHx_ZCOgjfU9rz-a=8FqA@mail.gmail.com>

On Thu, Feb 21, 2013 at 2:17 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> At
>   http://cr.openjdk.java.net/~**briangoetz/jdk-8008670/webrev/<http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/>
>
> I've posted a webrev for about half the classes in java.util.stream. None
> of these are public classes, so there are no public API issues here, but
> plenty of internal API issues, naming issues (ooh, a bikeshed), and code
> quality issues.
>

Things I noticed before I ran out of steam:

In AbstractTask<P_IN, P_OUT, R, T> the use of multicharacter type
parameters is confusing, especially with an underscore. AbstractTask<S, T,
R, C>, <T, U, R, C>, or even <K, V, R, T> would be better.

BiBlock -> BiConsumer in Map.java comments.

--tim

From joe.bowbeer at gmail.com  Sat Feb 23 11:42:30 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 23 Feb 2013 11:42:30 -0800
Subject: Code review request
In-Reply-To: <512672E6.1050708@oracle.com>
References: <512672E6.1050708@oracle.com>
Message-ID: <CAHzJPEpJPkrBs8rcomxeaxnKLQEVB9jAv7pBx7mEQqYMJQnD1Q@mail.gmail.com>

We should send these comments in emails?  I don't see a way to comment at
the link provided.

I repeat some of Remi's comments regarding formatting below.

File:

http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/src/share/classes/java/util/Map.java.patch

1. Please run this through a code formatter to conform with Oracle's
standard.  Things to fix:

parameter wrapping should indent only 8 spaces:

+ default V merge(K key, V value,
+                 BiFunction<? super V, ? super V, ? extends V>
remappingFunction) {

if-else brace should be on same line:

+ }
+ else if ((newValue = remappingFunction.apply(oldValue, value)) != null) {

multi-line 'if' always needs braces?

+ if (replace(key, oldValue, newValue))
+     return newValue;


2. replaceAll javadoc: Function#map => Function#apply

calling the function's {@code Function#map} method

=>
calling the function's {@code Function#apply} method


3. replaceAll question

What's with all the finals?

+        final Iterator<Map.Entry<K, V>> entries = entrySet().iterator();
+        while (entries.hasNext()) {
+            final Map.Entry<K, V> entry = entries.next();
+            entry.setValue(function.apply(entry.getKey(),
entry.getValue()));
+        }

Why not code this as follows, just like forEach?

+        for (Map.Entry<K, V> entry : entrySet()) {
+            entry.setValue(function.apply(entry.getKey(),
entry.getValue()));
+        }

--Joe


On Thu, Feb 21, 2013 at 11:17 AM, Brian Goetz <brian.goetz at oracle.com>wrote:

> At
>   http://cr.openjdk.java.net/~**briangoetz/jdk-8008670/webrev/<http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/>
>
> I've posted a webrev for about half the classes in java.util.stream. None
> of these are public classes, so there are no public API issues here, but
> plenty of internal API issues, naming issues (ooh, a bikeshed), and code
> quality issues.
>
>

From joe.bowbeer at gmail.com  Sun Feb 24 13:09:45 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sun, 24 Feb 2013 13:09:45 -0800
Subject: Code review request
In-Reply-To: <CAHzJPEpJPkrBs8rcomxeaxnKLQEVB9jAv7pBx7mEQqYMJQnD1Q@mail.gmail.com>
References: <512672E6.1050708@oracle.com>
	<CAHzJPEpJPkrBs8rcomxeaxnKLQEVB9jAv7pBx7mEQqYMJQnD1Q@mail.gmail.com>
Message-ID: <CAHzJPEqrP0XZ-D8ye4Ym_rEjw6a2j5nxONRZD4VuyOmCCSj-bg@mail.gmail.com>

A few more comments.

1. General:

The method descriptions should be written 3rd person declarative, according
to Oracle's style guide

http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#styleguide

This is not followed in many places.  For example:

Get the {@code StreamShape} describing the input shape of the pipeline

=>
Gets the {@code StreamShape} describing the input shape of the pipeline.


2. Typo (missing space) in PipelineHelper javadoc:

40  * the last intermediate operation described by this {@code
PipelineHelper}.The


3. StreamShape enum is missing its per-element javadoc

--Joe


On Sat, Feb 23, 2013 at 11:42 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> We should send these comments in emails?  I don't see a way to comment at
> the link provided.
>
> I repeat some of Remi's comments regarding formatting below.
>
> File:
>
>
> http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/src/share/classes/java/util/Map.java.patch
>
> 1. Please run this through a code formatter to conform with Oracle's
> standard.  Things to fix:
>
> parameter wrapping should indent only 8 spaces:
>
> + default V merge(K key, V value,
> +                 BiFunction<? super V, ? super V, ? extends V>
> remappingFunction) {
>
> if-else brace should be on same line:
>
> + }
> + else if ((newValue = remappingFunction.apply(oldValue, value)) != null) {
>
> multi-line 'if' always needs braces?
>
> + if (replace(key, oldValue, newValue))
> +     return newValue;
>
>
> 2. replaceAll javadoc: Function#map => Function#apply
>
> calling the function's {@code Function#map} method
>
> =>
> calling the function's {@code Function#apply} method
>
>
> 3. replaceAll question
>
> What's with all the finals?
>
> +        final Iterator<Map.Entry<K, V>> entries = entrySet().iterator();
> +        while (entries.hasNext()) {
> +            final Map.Entry<K, V> entry = entries.next();
> +            entry.setValue(function.apply(entry.getKey(),
> entry.getValue()));
> +        }
>
> Why not code this as follows, just like forEach?
>
> +        for (Map.Entry<K, V> entry : entrySet()) {
> +            entry.setValue(function.apply(entry.getKey(),
> entry.getValue()));
> +        }
>
> --Joe
>
>
> On Thu, Feb 21, 2013 at 11:17 AM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> At
>>   http://cr.openjdk.java.net/~**briangoetz/jdk-8008670/webrev/<http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/>
>>
>> I've posted a webrev for about half the classes in java.util.stream. None
>> of these are public classes, so there are no public API issues here, but
>> plenty of internal API issues, naming issues (ooh, a bikeshed), and code
>> quality issues.
>>
>>
>

From david.holmes at oracle.com  Sun Feb 24 19:07:48 2013
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Feb 2013 13:07:48 +1000
Subject: Code review request
In-Reply-To: <51289F3D.1010609@univ-mlv.fr>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
Message-ID: <512AD584.2080902@oracle.com>

On 23/02/2013 8:51 PM, Remi Forax wrote:
> On 02/21/2013 08:17 PM, Brian Goetz wrote:
>> At
>>   http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
>>
>> I've posted a webrev for about half the classes in java.util.stream.
>> None of these are public classes, so there are no public API issues
>> here, but plenty of internal API issues, naming issues (ooh, a
>> bikeshed), and code quality issues.
>>
>
> Hi Brian,
>
> All protected fields should not be protected but package visible.
> Classes are package private so there is no need to use a modifier which
> offer a wider visibility.
> The same is true for constructors.

I believe some of these may end up being public (TBD), in which case 
better to define member accessibility as if they were already public as 
it greatly simplifies the changes needed later.

David
-----


> For default method, some of them are marked public, some of them are not,
> what the coding convention said ?
>
> Code convention again, there is a lot of if/else with no curly braces,
> or only curly braces
> on the if part but not on the else part.
> Also, when a if block ends with a return, there is no need to use 'else',
>
> if (result != null) {
>     foundResult(result);
>     return result;
> }
> else
>     return null;
>
> can be simply written:
>
> if (result != null) {
>     foundResult(result);
>     return result;
> }
> return null;
>
>
> All inner class should not have private constructors, like by example
> FindOp.FindTask, because
> the compiler will have to generate a special accessor for them when they
> are called from
> the outer class.
>
> In AbstractShortCircuitTask:
> It's not clear that cancel and sharedResult can be accessed directly
> given that they both have methods that acts as getter and setter.
> If they can be accessed directly, I think it's better to declare them
> private and to use getters.
>
> Depending on the ops, some of them do nullcheks of arguments at creating
> time (ForEachOp) , some of them don't (FindOp).
> In ForEachUntilOp, the 'consumer' is checked but 'until' is not.
>
> in ForEachOp, most of the keyword protected are not needed,
> ForEachUntilOp which inherits from ForEachOp is in the same package.
>
> In ForEachUntilOp, the constructor should be private (like for all the
> other ops).
>
> In MatchOp, line 110, I think the compiler bug is fixed now ?
> The enum MatchKind should not be public and all constructor of all inner
> classes should not be private.
>
> In OpsUtils, some static methods are public and some are not,
>
> in Tripwire:
>    enabled should be in uppercase (ENABLED).
>    method trip() should be:
>       public static void trip(Class<?> trippingClass, String msg)
>
> cheers,
> R?mi
>

From forax at univ-mlv.fr  Mon Feb 25 01:03:24 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Mon, 25 Feb 2013 10:03:24 +0100
Subject: Code review request
In-Reply-To: <512AD584.2080902@oracle.com>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
	<512AD584.2080902@oracle.com>
Message-ID: <512B28DC.7050101@univ-mlv.fr>

On 02/25/2013 04:07 AM, David Holmes wrote:
> On 23/02/2013 8:51 PM, Remi Forax wrote:
>> On 02/21/2013 08:17 PM, Brian Goetz wrote:
>>> At
>>>   http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
>>>
>>> I've posted a webrev for about half the classes in java.util.stream.
>>> None of these are public classes, so there are no public API issues
>>> here, but plenty of internal API issues, naming issues (ooh, a
>>> bikeshed), and code quality issues.
>>>
>>
>> Hi Brian,
>>
>> All protected fields should not be protected but package visible.
>> Classes are package private so there is no need to use a modifier which
>> offer a wider visibility.
>> The same is true for constructors.
>
> I believe some of these may end up being public (TBD), in which case 
> better to define member accessibility as if they were already public 
> as it greatly simplifies the changes needed later.
>
> David
> -----

Given that the release of jdk9 is at least two years from now, this API 
will change, one will come with a GPU pipeline (Sumatra?) or with a 
flattened bytecode pipeline (my pet project), so trying to figure out 
now what should be public or not is like predicting the future in a 
crystal ball. I think it's better to let all members package private and 
see later.
BTW, I have no problem with protected methods, my main concern is 
protected fields or protected inner classes.

R?mi

>
>
>> For default method, some of them are marked public, some of them are 
>> not,
>> what the coding convention said ?
>>
>> Code convention again, there is a lot of if/else with no curly braces,
>> or only curly braces
>> on the if part but not on the else part.
>> Also, when a if block ends with a return, there is no need to use 
>> 'else',
>>
>> if (result != null) {
>>     foundResult(result);
>>     return result;
>> }
>> else
>>     return null;
>>
>> can be simply written:
>>
>> if (result != null) {
>>     foundResult(result);
>>     return result;
>> }
>> return null;
>>
>>
>> All inner class should not have private constructors, like by example
>> FindOp.FindTask, because
>> the compiler will have to generate a special accessor for them when they
>> are called from
>> the outer class.
>>
>> In AbstractShortCircuitTask:
>> It's not clear that cancel and sharedResult can be accessed directly
>> given that they both have methods that acts as getter and setter.
>> If they can be accessed directly, I think it's better to declare them
>> private and to use getters.
>>
>> Depending on the ops, some of them do nullcheks of arguments at creating
>> time (ForEachOp) , some of them don't (FindOp).
>> In ForEachUntilOp, the 'consumer' is checked but 'until' is not.
>>
>> in ForEachOp, most of the keyword protected are not needed,
>> ForEachUntilOp which inherits from ForEachOp is in the same package.
>>
>> In ForEachUntilOp, the constructor should be private (like for all the
>> other ops).
>>
>> In MatchOp, line 110, I think the compiler bug is fixed now ?
>> The enum MatchKind should not be public and all constructor of all inner
>> classes should not be private.
>>
>> In OpsUtils, some static methods are public and some are not,
>>
>> in Tripwire:
>>    enabled should be in uppercase (ENABLED).
>>    method trip() should be:
>>       public static void trip(Class<?> trippingClass, String msg)
>>
>> cheers,
>> R?mi
>>


From paul.sandoz at oracle.com  Mon Feb 25 09:31:32 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Mon, 25 Feb 2013 18:31:32 +0100
Subject: Code review request
In-Reply-To: <51289F3D.1010609@univ-mlv.fr>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
Message-ID: <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>

Hi Remi,

Thanks for the feedback i have addressed some of this, mostly related to inner classes, in following change set to the lambda repo:

http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea

We can update the webrev next week.


On Feb 23, 2013, at 11:51 AM, Remi Forax <forax at univ-mlv.fr> wrote:

> On 02/21/2013 08:17 PM, Brian Goetz wrote:
>> At
>>  http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
>> 
>> I've posted a webrev for about half the classes in java.util.stream. None of these are public classes, so there are no public API issues here, but plenty of internal API issues, naming issues (ooh, a bikeshed), and code quality issues.
>> 
> 
> Hi Brian,
> 
> All protected fields should not be protected but package visible.
> Classes are package private so there is no need to use a modifier which offer a wider visibility.
> The same is true for constructors.
> 

I agree with this, if there are no further objections i will fix in the lambda repo towards the end of the week.


> For default method, some of them are marked public, some of them are not,
> what the coding convention said ?
> 

AFAICT "public" was only on two such default methods, so i have removed that modifier.


> Code convention again, there is a lot of if/else with no curly braces, or only curly braces
> on the if part but not on the else part.
> Also, when a if block ends with a return, there is no need to use 'else',
> 
> if (result != null) {
>   foundResult(result);
>   return result;
> }
> else
>   return null;
> 
> can be simply written:
> 
> if (result != null) {
>   foundResult(result);
>   return result;
> }
> return null;
> 

Regarding code conventions i would prefer to auto-format all code to ensure consistency, as to what that consistency is, well we could argue until heat death of the universe :-) I am fine as long as it is consistent and easy to hit Alt-Cmd-L or what ever it is in ones favourite IDE.


> 
> All inner class should not have private constructors, like by example FindOp.FindTask, because
> the compiler will have to generate a special accessor for them when they are called from
> the outer class.
> 

I have made changes to all inner classes to conform to this. I have also marked all classes as final where appropriate.


> In AbstractShortCircuitTask:
> It's not clear that cancel and sharedResult can be accessed directly given that they both have methods that acts as getter and setter.
> If they can be accessed directly, I think it's better to declare them private and to use getters.
> 

They should be private, they are not accessed outside of that class. I will fix.


> Depending on the ops, some of them do nullcheks of arguments at creating time (ForEachOp) , some of them don't (FindOp).
> In ForEachUntilOp, the 'consumer' is checked but 'until' is not.
> 

OK, there are probably lots of missing null checks in the code...


> in ForEachOp, most of the keyword protected are not needed, ForEachUntilOp which inherits from ForEachOp is in the same package.
> 


> In ForEachUntilOp, the constructor should be private (like for all the other ops).
> 

Done.


> In MatchOp, line 110, I think the compiler bug is fixed now ?

Not yet, i can still reproduce it.


> The enum MatchKind should not be public and all constructor of all inner classes should not be private.
> 

Done.


> In OpsUtils, some static methods are public and some are not,
> 

OpUtils is now gone in the lambda repo. The forEach and reduce functionality is moved into the corresponding op classes. The static method has been moved to a default method on PipelineHelper.


> in Tripwire:
>  enabled should be in uppercase (ENABLED).
>  method trip() should be:
>     public static void trip(Class<?> trippingClass, String msg)
> 

Done. I also made the field and method package private.

Thanks,
Paul.


From joe.bowbeer at gmail.com  Mon Feb 25 12:46:15 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 25 Feb 2013 12:46:15 -0800
Subject: Code review request
In-Reply-To: <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
	<458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
Message-ID: <CAHzJPEqvCHh96yRcULWV8c3PvJ5Dkc5fhCzTfJ7V3n-KJYpo8A@mail.gmail.com>

On Feb 25, 2013 9:31 AM, "Paul Sandoz" <paul.sandoz at oracle.com> wrote:
>
> Hi Remi,
>
> Thanks for the feedback i have addressed some of this, mostly related to
inner classes, in following change set to the lambda repo:
>
> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea
>
> We can update the webrev next week.
>
>
> On Feb 23, 2013, at 11:51 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>
>> On 02/21/2013 08:17 PM, Brian Goetz wrote:
>>>
>>> At
>>>  http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
>>>
>>> I've posted a webrev for about half the classes in java.util.stream.
None of these are public classes, so there are no public API issues here,
but plenty of internal API issues, naming issues (ooh, a bikeshed), and
code quality issues.
>>>
>>
>> Hi Brian,
>>
>> Also, when a if block ends with a return, there is no need to use 'else',
>>
>> if (result != null) {
>>   foundResult(result);
>>   return result;
>> }
>> else
>>   return null;
>>
>> can be simply written:
>>
>> if (result != null) {
>>   foundResult(result);
>>   return result;
>> }
>> return null;
>>
>
> Regarding code conventions i would prefer to auto-format all code to
ensure consistency, as to what that consistency is, well we could argue
until heat death of the universe :-) I am fine as long as it is consistent
and easy to hit Alt-Cmd-L or what ever it is in ones favourite IDE.
>

The omission of else after a return is a refinement that is not covered in
any style guide that I am aware of.

However, I think most everything else is covered by these:

http://www.oracle.com/technetwork/java/codeconv-138413.html

Alt-Shift-F ;-)

Joe

From david.holmes at oracle.com  Mon Feb 25 13:45:47 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 26 Feb 2013 07:45:47 +1000
Subject: Code review request
In-Reply-To: <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
	<458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
Message-ID: <512BDB8B.9070005@oracle.com>

On 26/02/2013 3:31 AM, Paul Sandoz wrote:
> Hi Remi,
>
> Thanks for the feedback i have addressed some of this, mostly related to
> inner classes, in following change set to the lambda repo:
>
> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea

I see a lot of private things that are now package-access. Is that 
because they are now being used within the package?

The access modifiers document intended usage even if there is limited 
accessibility to the class defining the member. The idea that a class 
restricted to package-access should have member access modifiers 
restricted to package-only or else private, is just plain wrong in my 
view. Each type should have a public, protected and private API. The 
exposure of the type within a package is a separate matter. 
Package-access then becomes a limited-sharing mechanism.

David

> We can update the webrev next week.
>
>
> On Feb 23, 2013, at 11:51 AM, Remi Forax <forax at univ-mlv.fr
> <mailto:forax at univ-mlv.fr>> wrote:
>
>> On 02/21/2013 08:17 PM, Brian Goetz wrote:
>>> At
>>> http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
>>>
>>> I've posted a webrev for about half the classes in java.util.stream.
>>> None of these are public classes, so there are no public API issues
>>> here, but plenty of internal API issues, naming issues (ooh, a
>>> bikeshed), and code quality issues.
>>>
>>
>> Hi Brian,
>>
>> All protected fields should not be protected but package visible.
>> Classes are package private so there is no need to use a modifier
>> which offer a wider visibility.
>> The same is true for constructors.
>>
>
> I agree with this, if there are no further objections i will fix in the
> lambda repo towards the end of the week.
>
>
>> For default method, some of them are marked public, some of them are not,
>> what the coding convention said ?
>>
>
> AFAICT "public" was only on two such default methods, so i have removed
> that modifier.
>
>
>> Code convention again, there is a lot of if/else with no curly braces,
>> or only curly braces
>> on the if part but not on the else part.
>> Also, when a if block ends with a return, there is no need to use 'else',
>>
>> if (result != null) {
>>   foundResult(result);
>>   return result;
>> }
>> else
>>   return null;
>>
>> can be simply written:
>>
>> if (result != null) {
>>   foundResult(result);
>>   return result;
>> }
>> return null;
>>
>
> Regarding code conventions i would prefer to auto-format all code to
> ensure consistency, as to what that consistency is, well we could argue
> until heat death of the universe :-) I am fine as long as it is
> consistent and easy to hit Alt-Cmd-L or what ever it is in ones
> favourite IDE.
>
>
>>
>> All inner class should not have private constructors, like by example
>> FindOp.FindTask, because
>> the compiler will have to generate a special accessor for them when
>> they are called from
>> the outer class.
>>
>
> I have made changes to all inner classes to conform to this. I have also
> marked all classes as final where appropriate.
>
>
>> In AbstractShortCircuitTask:
>> It's not clear that cancel and sharedResult can be accessed directly
>> given that they both have methods that acts as getter and setter.
>> If they can be accessed directly, I think it's better to declare them
>> private and to use getters.
>>
>
> They should be private, they are not accessed outside of that class. I
> will fix.
>
>
>> Depending on the ops, some of them do nullcheks of arguments at
>> creating time (ForEachOp) , some of them don't (FindOp).
>> In ForEachUntilOp, the 'consumer' is checked but 'until' is not.
>>
>
> OK, there are probably lots of missing null checks in the code...
>
>
>> in ForEachOp, most of the keyword protected are not needed,
>> ForEachUntilOp which inherits from ForEachOp is in the same package.
>>
>
>
>> In ForEachUntilOp, the constructor should be private (like for all the
>> other ops).
>>
>
> Done.
>
>
>> In MatchOp, line 110, I think the compiler bug is fixed now ?
>
> Not yet, i can still reproduce it.
>
>
>> The enum MatchKind should not be public and all constructor of all
>> inner classes should not be private.
>>
>
> Done.
>
>
>> In OpsUtils, some static methods are public and some are not,
>>
>
> OpUtils is now gone in the lambda repo. The forEach and reduce
> functionality is moved into the corresponding op classes. The static
> method has been moved to a default method on PipelineHelper.
>
>
>> in Tripwire:
>>  enabled should be in uppercase (ENABLED).
>>  method trip() should be:
>>     public static void trip(Class<?> trippingClass, String msg)
>>
>
> Done. I also made the field and method package private.
>
> Thanks,
> Paul.
>

From sam at sampullara.com  Mon Feb 25 21:16:27 2013
From: sam at sampullara.com (Sam Pullara)
Date: Mon, 25 Feb 2013 21:16:27 -0800
Subject: Where has the map method on Optional moved?
In-Reply-To: <CAN3nywD33uRJL545Np=NVmUbGQvF_Jq5su0nH=yCUywRy8Y5mg@mail.gmail.com>
References: <CAD2aR4OeHOG+E9=mgZRZo2kM5TktjqTrv4ZuLiCsJKWnNHorQQ@mail.gmail.com>
	<5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com>
	<CAN3nywD33uRJL545Np=NVmUbGQvF_Jq5su0nH=yCUywRy8Y5mg@mail.gmail.com>
Message-ID: <CAMUF1SksOYOB92YnkTR5t+WUMYyidsa9nXbdTn3SDBu=H3z-kA@mail.gmail.com>

I've never been comfortable with this. I'm glad Jed is calling it out.
Can we make Optional first class or remove it?

Sam

On Mon, Feb 25, 2013 at 9:12 PM, Jed Wesley-Smith <jed at wesleysmith.io> wrote:
> Hi Paul,
>
> You don't get a choice, it is a (or forms a) monad, you just removed
> the useful methods (map/flatMap aka fmap/bind). This leaves clients to
> implement them (or the functionality) in an ad-hoc and possibly buggy
> form themselves.
>
> It is a monad if there exists some pair of functions:
>
> A -> Option<A>
> Option<A> -> (A -> Option<B>) -> Option<B>
>
> The first is Optional.of, the second is currently:
>
> Optional<A> a = ?
> Optional<B> b = ?
> Function<A, Optional<B> f = ?
> if (a.isPresent) {
>   b = f.apply(a.get());
> } else {
>   b = Optional.empty();
> }
>
> rather than:
>
> Optional<A> a = ?
> Function<A, Optional<B> f = ?
> final Optional<B> b = a.flatMap(f);
>
> cheers,
> jed.
>
> On 26 February 2013 00:12, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> Hi Dhananjay,
>>
>> It is not missing it was removed.
>>
>> java.util.Optional has a narrower scope that optional things in other languages. We are not trying to shoe-horn in an option monad.
>>
>> Paul.
>>
>> On Feb 23, 2013, at 12:27 AM, Dhananjay Nene <dhananjay.nene at gmail.com> wrote:
>>
>>> It seemed to be there on the Optional class in b61 but is missing now. Is
>>> there some way to run map/flatMap operations on an Optional?
>>>
>>> Thanks
>>> Dhananjay
>>>
>>
>>
>

From forax at univ-mlv.fr  Mon Feb 25 23:19:26 2013
From: forax at univ-mlv.fr (=?utf-8?B?UmVtaSBGb3JheA==?=)
Date: Tue, 26 Feb 2013 08:19:26 +0100
Subject: =?utf-8?B?UmU6IFdoZXJlIGhhcyB0aGUgbWFwIG1ldGhvZCBvbiBPcHRpb25hbCBtb3ZlZD8=?=
Message-ID: <201302260719.r1Q7JNh4004890@monge.univ-mlv.fr>

Yes, I vote to remove it because it doesn't map :) well with the java mindset.
That said, we already discuss that and other alternatives are less nice to use, at least until we use the static import trick (as with reducers)


Sent from my Phone

----- Reply message -----
From: "Sam Pullara" <sam at sampullara.com>
To: <lambda-libs-spec-experts at openjdk.java.net>
Subject: Where has the map method on Optional moved?
Date: Tue, Feb 26, 2013 06:16


I've never been comfortable with this. I'm glad Jed is calling it out.
Can we make Optional first class or remove it?

Sam

On Mon, Feb 25, 2013 at 9:12 PM, Jed Wesley-Smith <jed at wesleysmith.io> wrote:
> Hi Paul,
>
> You don't get a choice, it is a (or forms a) monad, you just removed
> the useful methods (map/flatMap aka fmap/bind). This leaves clients to
> implement them (or the functionality) in an ad-hoc and possibly buggy
> form themselves.
>
> It is a monad if there exists some pair of functions:
>
> A -> Option<A>
> Option<A> -> (A -> Option<B>) -> Option<B>
>
> The first is Optional.of, the second is currently:
>
> Optional<A> a = ?
> Optional<B> b = ?
> Function<A, Optional<B> f = ?
> if (a.isPresent) {
>   b = f.apply(a.get());
> } else {
>   b = Optional.empty();
> }
>
> rather than:
>
> Optional<A> a = ?
> Function<A, Optional<B> f = ?
> final Optional<B> b = a.flatMap(f);
>
> cheers,
> jed.
>
> On 26 February 2013 00:12, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> Hi Dhananjay,
>>
>> It is not missing it was removed.
>>
>> java.util.Optional has a narrower scope that optional things in other languages. We are not trying to shoe-horn in an option monad.
>>
>> Paul.
>>
>> On Feb 23, 2013, at 12:27 AM, Dhananjay Nene <dhananjay.nene at gmail.com> wrote:
>>
>>> It seemed to be there on the Optional class in b61 but is missing now. Is
>>> there some way to run map/flatMap operations on an Optional?
>>>
>>> Thanks
>>> Dhananjay
>>>
>>
>>
>

From forax at univ-mlv.fr  Tue Feb 26 00:15:00 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 26 Feb 2013 09:15:00 +0100
Subject: Where has the map method on Optional moved?
In-Reply-To: <201302260719.r1Q7JNh4004890@monge.univ-mlv.fr>
References: <201302260719.r1Q7JNh4004890@monge.univ-mlv.fr>
Message-ID: <512C6F04.1020206@univ-mlv.fr>

On 02/26/2013 08:19 AM, Remi Forax wrote:
> Yes, I vote to remove it because it doesn't map :) well with the java 
> mindset.
> That said, we already discussed that and other alternatives are less 
> nice to use, at least until we use the static import trick (as with 
> reducers)

just to be crystal clear.
interface Optionalizer<T, R> {  // good name needed
   abstract R result(boolean isPresent, T element);

   public static <T> Optionalizer<T,T> orDefaultValue(T defaultValue) {
     return (isPresent, element) -> isPresent? element: defaultValue;
   }

   public static <T> Optionalizer<T, Boolean> isPresent() {
     return (isPresent, element) -> isPresent;
   }

   public static <T> Optionalizer<T, Void> andIfPresent(Consumer<? super 
T> consumer) {
     return (isPresent, element) -> {
       if (isPresent) {
         consumer.accept(element);
       }
     };
   }

   public static <T> Optionalizer<T, T> T orNull() {
     // doesn't use orDefaultValue(null) because the returned lambda is 
not constant :)
     // maybe better to do a null check in orDefaultValue
     return (isPresent, element) -> isPresent? element: null:
   }
}

with:
interface Stream<T> {
    <R> R findFirst(Optionalizer<? super T, ? extends R> optionalizer);
}

examples:
   String s = streamOfString.findFirst(orDefaultValue("<no value>")):
   boolean isPresent = streamOfString.findFirst(isPresent());
   streamOfString.findFirst(andIfPresent(System.out::println));
   String s2 = streamOfString.findFirst(orNull()):

I think i can like this.

R?mi

>
>
>
> Sent from my Phone
>
> ----- Reply message -----
> From: "Sam Pullara" <sam at sampullara.com>
> To: <lambda-libs-spec-experts at openjdk.java.net>
> Subject: Where has the map method on Optional moved?
> Date: Tue, Feb 26, 2013 06:16
>
>
> I've never been comfortable with this. I'm glad Jed is calling it out.
> Can we make Optional first class or remove it?
>
> Sam
>
> On Mon, Feb 25, 2013 at 9:12 PM, Jed Wesley-Smith <jed at wesleysmith.io> 
> wrote:
> > Hi Paul,
> >
> > You don't get a choice, it is a (or forms a) monad, you just removed
> > the useful methods (map/flatMap aka fmap/bind). This leaves clients to
> > implement them (or the functionality) in an ad-hoc and possibly buggy
> > form themselves.
> >
> > It is a monad if there exists some pair of functions:
> >
> > A -> Option<A>
> > Option<A> -> (A -> Option<B>) -> Option<B>
> >
> > The first is Optional.of, the second is currently:
> >
> > Optional<A> a = ?
> > Optional<B> b = ?
> > Function<A, Optional<B> f = ?
> > if (a.isPresent) {
> >   b = f.apply(a.get());
> > } else {
> >   b = Optional.empty();
> > }
> >
> > rather than:
> >
> > Optional<A> a = ?
> > Function<A, Optional<B> f = ?
> > final Optional<B> b = a.flatMap(f);
> >
> > cheers,
> > jed.
> >
> > On 26 February 2013 00:12, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> >> Hi Dhananjay,
> >>
> >> It is not missing it was removed.
> >>
> >> java.util.Optional has a narrower scope that optional things in 
> other languages. We are not trying to shoe-horn in an option monad.
> >>
> >> Paul.
> >>
> >> On Feb 23, 2013, at 12:27 AM, Dhananjay Nene 
> <dhananjay.nene at gmail.com> wrote:
> >>
> >>> It seemed to be there on the Optional class in b61 but is missing 
> now. Is
> >>> there some way to run map/flatMap operations on an Optional?
> >>>
> >>> Thanks
> >>> Dhananjay
> >>>
> >>
> >>
> >


From paul.sandoz at oracle.com  Tue Feb 26 02:47:25 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 26 Feb 2013 11:47:25 +0100
Subject: Where has the map method on Optional moved?
In-Reply-To: <CAMUF1SksOYOB92YnkTR5t+WUMYyidsa9nXbdTn3SDBu=H3z-kA@mail.gmail.com>
References: <CAD2aR4OeHOG+E9=mgZRZo2kM5TktjqTrv4ZuLiCsJKWnNHorQQ@mail.gmail.com>
	<5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com>
	<CAN3nywD33uRJL545Np=NVmUbGQvF_Jq5su0nH=yCUywRy8Y5mg@mail.gmail.com>
	<CAMUF1SksOYOB92YnkTR5t+WUMYyidsa9nXbdTn3SDBu=H3z-kA@mail.gmail.com>
Message-ID: <F8922B51-659A-4EDB-8D7F-BF6BA84EEB3D@oracle.com>

On Feb 26, 2013, at 6:16 AM, Sam Pullara <sam at sampullara.com> wrote:
> I've never been comfortable with this. I'm glad Jed is calling it out.
> Can we make Optional first class or remove it?

Trawling through the email archives i cannot find specific discussion on Optional.map/flatMap, anyone recall such discussion? There is some general discussion on not going down the route of an option monad.

Since we have added a Stream.flatMap method following the pattern expected of it, does that give more weight to argument of adding such a method to Optional, perhaps of the same or a different name to disassociate with Stream itself?

I find myself more and more leaning towards the position of if we include an Option class some developers will expect bind functionality, it seems useful, and nor is it hard to explain its use while avoiding mention the "m" word.

Paul.


> 
> Sam
> 
> On Mon, Feb 25, 2013 at 9:12 PM, Jed Wesley-Smith <jed at wesleysmith.io> wrote:
>> Hi Paul,
>> 
>> You don't get a choice, it is a (or forms a) monad, you just removed
>> the useful methods (map/flatMap aka fmap/bind). This leaves clients to
>> implement them (or the functionality) in an ad-hoc and possibly buggy
>> form themselves.
>> 
>> It is a monad if there exists some pair of functions:
>> 
>> A -> Option<A>
>> Option<A> -> (A -> Option<B>) -> Option<B>
>> 
>> The first is Optional.of, the second is currently:
>> 
>> Optional<A> a = ?
>> Optional<B> b = ?
>> Function<A, Optional<B> f = ?
>> if (a.isPresent) {
>>  b = f.apply(a.get());
>> } else {
>>  b = Optional.empty();
>> }
>> 
>> rather than:
>> 
>> Optional<A> a = ?
>> Function<A, Optional<B> f = ?
>> final Optional<B> b = a.flatMap(f);
>> 
>> cheers,
>> jed.
>> 
>> On 26 February 2013 00:12, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>>> Hi Dhananjay,
>>> 
>>> It is not missing it was removed.
>>> 
>>> java.util.Optional has a narrower scope that optional things in other languages. We are not trying to shoe-horn in an option monad.
>>> 
>>> Paul.
>>> 
>>> On Feb 23, 2013, at 12:27 AM, Dhananjay Nene <dhananjay.nene at gmail.com> wrote:
>>> 
>>>> It seemed to be there on the Optional class in b61 but is missing now. Is
>>>> there some way to run map/flatMap operations on an Optional?
>>>> 
>>>> Thanks
>>>> Dhananjay
>>>> 
>>> 
>>> 
>> 


From paul.sandoz at oracle.com  Tue Feb 26 03:03:00 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 26 Feb 2013 12:03:00 +0100
Subject: Code review request
In-Reply-To: <CA+F8eeQ5hrs06pFM1gqZNuvziVK9WWHx_ZCOgjfU9rz-a=8FqA@mail.gmail.com>
References: <512672E6.1050708@oracle.com>
	<CA+F8eeQ5hrs06pFM1gqZNuvziVK9WWHx_ZCOgjfU9rz-a=8FqA@mail.gmail.com>
Message-ID: <4D7CE6AD-5BC6-4FBA-A41E-C8BC182F2F64@oracle.com>

Hi Tim,

Thanks for the comments.

On Feb 23, 2013, at 6:06 PM, Tim Peierls <tim at peierls.net> wrote:

> On Thu, Feb 21, 2013 at 2:17 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
>> At
>>  http://cr.openjdk.java.net/~**briangoetz/jdk-8008670/webrev/<http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/>
>> 
>> I've posted a webrev for about half the classes in java.util.stream. None
>> of these are public classes, so there are no public API issues here, but
>> plenty of internal API issues, naming issues (ooh, a bikeshed), and code
>> quality issues.
>> 
> 
> Things I noticed before I ran out of steam:
> 
> In AbstractTask<P_IN, P_OUT, R, T> the use of multicharacter type
> parameters is confusing, especially with an underscore. AbstractTask<S, T,
> R, C>, <T, U, R, C>, or even <K, V, R, T> would be better.
> 

I added "P_IN" etc because i found i kept forgetting which single character type variable corresponded to what :-)

So for the following:

interface IntermediateOp<E_IN, E_OUT> {
    Sink<E_IN> wrapSink(int flags, Sink<E_OUT> sink);
    default <P_IN> Node<E_OUT> evaluateParallel(PipelineHelper<P_IN, E_OUT> helper)

I found it clearer which variables were corresponding to the input/output types of of pipeline and which to the ops, and have attempted to stick to that pattern throughout the code, although i suspect it is not completely consistent. Having said that i think consistency is of use is the most important aspect.


> BiBlock -> BiConsumer in Map.java comments.
> 

Fixed in the lambda repo.

Paul.

From paul.sandoz at oracle.com  Tue Feb 26 03:54:15 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 26 Feb 2013 12:54:15 +0100
Subject: Code review request
In-Reply-To: <CAHzJPEpJPkrBs8rcomxeaxnKLQEVB9jAv7pBx7mEQqYMJQnD1Q@mail.gmail.com>
References: <512672E6.1050708@oracle.com>
	<CAHzJPEpJPkrBs8rcomxeaxnKLQEVB9jAv7pBx7mEQqYMJQnD1Q@mail.gmail.com>
Message-ID: <D268C0EE-2EC5-4EA0-B5B5-169D2AE30DEB@oracle.com>

Hi Joe,

Thanks for the comments.

I have not resolved the more general comments about code style and tense.


On Feb 23, 2013, at 8:42 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> We should send these comments in emails?  I don't see a way to comment at
> the link provided.
> 
> I repeat some of Remi's comments regarding formatting below.
> 
> File:
> 
> http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/src/share/classes/java/util/Map.java.patch
> 
> 1. Please run this through a code formatter to conform with Oracle's
> standard.  Things to fix:
> 
> parameter wrapping should indent only 8 spaces:
> 
> + default V merge(K key, V value,
> +                 BiFunction<? super V, ? super V, ? extends V>
> remappingFunction) {
> 
> if-else brace should be on same line:
> 
> + }
> + else if ((newValue = remappingFunction.apply(oldValue, value)) != null) {
> 
> multi-line 'if' always needs braces?
> 
> + if (replace(key, oldValue, newValue))
> +     return newValue;
> 
> 
> 2. replaceAll javadoc: Function#map => Function#apply
> 
> calling the function's {@code Function#map} method
> 
> =>
> calling the function's {@code Function#apply} method
> 

Fixed in the lambda repo.


> 
> 3. replaceAll question
> 
> What's with all the finals?
> 
> +        final Iterator<Map.Entry<K, V>> entries = entrySet().iterator();
> +        while (entries.hasNext()) {
> +            final Map.Entry<K, V> entry = entries.next();
> +            entry.setValue(function.apply(entry.getKey(),
> entry.getValue()));
> +        }
> 
> Why not code this as follows, just like forEach?
> 
> +        for (Map.Entry<K, V> entry : entrySet()) {
> +            entry.setValue(function.apply(entry.getKey(),
> entry.getValue()));
> +        }
> 

Fixed.


On Feb 24, 2013, at 10:09 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
> A few more comments.
> 
> 1. General:
> 
> The method descriptions should be written 3rd person declarative, according
> to Oracle's style guide
> 
> http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#styleguide
> 
> This is not followed in many places.  For example:
> 
> Get the {@code StreamShape} describing the input shape of the pipeline
> 
> =>
> Gets the {@code StreamShape} describing the input shape of the pipeline.
> 
> 
> 2. Typo (missing space) in PipelineHelper javadoc:
> 
> 40  * the last intermediate operation described by this {@code
> PipelineHelper}.The
> 

Fixed.


> 
> 3. StreamShape enum is missing its per-element javadoc
> 

Fixed.

Paul.


From tim at peierls.net  Tue Feb 26 04:37:19 2013
From: tim at peierls.net (Tim Peierls)
Date: Tue, 26 Feb 2013 07:37:19 -0500
Subject: Where has the map method on Optional moved?
In-Reply-To: <F8922B51-659A-4EDB-8D7F-BF6BA84EEB3D@oracle.com>
References: <CAD2aR4OeHOG+E9=mgZRZo2kM5TktjqTrv4ZuLiCsJKWnNHorQQ@mail.gmail.com>
	<5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com>
	<CAN3nywD33uRJL545Np=NVmUbGQvF_Jq5su0nH=yCUywRy8Y5mg@mail.gmail.com>
	<CAMUF1SksOYOB92YnkTR5t+WUMYyidsa9nXbdTn3SDBu=H3z-kA@mail.gmail.com>
	<F8922B51-659A-4EDB-8D7F-BF6BA84EEB3D@oracle.com>
Message-ID: <CA+F8eeRX5Lqx_bMsekHBB=MmxpN8HYuMA9sGtTaiW2k6G9QYrw@mail.gmail.com>

On Tue, Feb 26, 2013 at 5:47 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:

> On Feb 26, 2013, at 6:16 AM, Sam Pullara <sam at sampullara.com> wrote:
> > I've never been comfortable with this. I'm glad Jed is calling it out.
> > Can we make Optional first class or remove it?
>
> Trawling through the email archives i cannot find specific discussion on
> Optional.map/flatMap, anyone recall such discussion? There is some general
> discussion on not going down the route of an option monad.
>

In particular, you wrote:

> java.util.Optional has a narrower scope that optional things in other
> languages. We are not trying to shoe-horn in an option monad.


And I say amen to that. Keep Optional, and keep it lean and mean. (At least
as lean as the Guava Optional.)


> Since we have added a Stream.flatMap method following the pattern expected
> of it, does that give more weight to argument of adding such a method to
> Optional, perhaps of the same or a different name to disassociate with
> Stream itself?
>

No, it doesn't. The two are unrelated. Stream is a lofty abstraction and
Optional is (or should be) a grunt-level utility.


> I find myself more and more leaning towards the position of if we include
> an Option class some developers will expect bind functionality, it seems
> useful, and nor is it hard to explain its use while avoiding mention the
> "m" word.
>

The more you saddle Optional with "might be useful" and "X might want this"
features, the harder it will be for regular people -- people who don't know
what a monad is -- to assimilate and use Optional sensibly. One of the
reasons Doug Lea is not an Optional fan is fear of things like
Collection<Optional<T>>, something that is less likely to happen with a
minimalist Optional.

Here's another reason to stay lean: The more limited Optional is, the
easier it will be some day to optimize away the extra object. Make it a
first class participant and you can kiss those optimizations goodbye.

--tim

From paul.sandoz at oracle.com  Tue Feb 26 04:49:48 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 26 Feb 2013 13:49:48 +0100
Subject: Where has the map method on Optional moved?
In-Reply-To: <CA+F8eeRX5Lqx_bMsekHBB=MmxpN8HYuMA9sGtTaiW2k6G9QYrw@mail.gmail.com>
References: <CAD2aR4OeHOG+E9=mgZRZo2kM5TktjqTrv4ZuLiCsJKWnNHorQQ@mail.gmail.com>
	<5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com>
	<CAN3nywD33uRJL545Np=NVmUbGQvF_Jq5su0nH=yCUywRy8Y5mg@mail.gmail.com>
	<CAMUF1SksOYOB92YnkTR5t+WUMYyidsa9nXbdTn3SDBu=H3z-kA@mail.gmail.com>
	<F8922B51-659A-4EDB-8D7F-BF6BA84EEB3D@oracle.com>
	<CA+F8eeRX5Lqx_bMsekHBB=MmxpN8HYuMA9sGtTaiW2k6G9QYrw@mail.gmail.com>
Message-ID: <7673857C-B407-489F-AB19-7B5317AB9AE7@oracle.com>


On Feb 26, 2013, at 1:37 PM, Tim Peierls <tim at peierls.net> wrote:

> On Tue, Feb 26, 2013 at 5:47 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> On Feb 26, 2013, at 6:16 AM, Sam Pullara <sam at sampullara.com> wrote:
> > I've never been comfortable with this. I'm glad Jed is calling it out.
> > Can we make Optional first class or remove it?
> 
> Trawling through the email archives i cannot find specific discussion on Optional.map/flatMap, anyone recall such discussion? There is some general discussion on not going down the route of an option monad.
> 
> In particular, you wrote:
> java.util.Optional has a narrower scope that optional things in other languages. We are not trying to shoe-horn in an option monad. 
> 
> And I say amen to that. Keep Optional, and keep it lean and mean. (At least as lean as the Guava Optional.)
> 

I notice Guava's Optional has a transform method:

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Optional.html#transform(com.google.common.base.Function)


>  
> Since we have added a Stream.flatMap method following the pattern expected of it, does that give more weight to argument of adding such a method to Optional, perhaps of the same or a different name to disassociate with Stream itself?
> 
> No, it doesn't. The two are unrelated. Stream is a lofty abstraction and Optional is (or should be) a grunt-level utility.
> 
>  
> I find myself more and more leaning towards the position of if we include an Option class some developers will expect bind functionality, it seems useful, and nor is it hard to explain its use while avoiding mention the "m" word.
> 
> The more you saddle Optional with "might be useful" and "X might want this" features, the harder it will be for regular people -- people who don't know what a monad is -- to assimilate and use Optional sensibly. One of the reasons Doug Lea is not an Optional fan is fear of things like Collection<Optional<T>>, something that is less likely to happen with a minimalist Optional.
> 

Especially so if we make it more difficult by removing the hashcode and equals methods.


> Here's another reason to stay lean: The more limited Optional is, the easier it will be some day to optimize away the extra object. Make it a first class participant and you can kiss those optimizations goodbye. 
> 

Very true.

Paul.

From tim at peierls.net  Tue Feb 26 05:02:55 2013
From: tim at peierls.net (Tim Peierls)
Date: Tue, 26 Feb 2013 08:02:55 -0500
Subject: Where has the map method on Optional moved?
In-Reply-To: <7673857C-B407-489F-AB19-7B5317AB9AE7@oracle.com>
References: <CAD2aR4OeHOG+E9=mgZRZo2kM5TktjqTrv4ZuLiCsJKWnNHorQQ@mail.gmail.com>
	<5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com>
	<CAN3nywD33uRJL545Np=NVmUbGQvF_Jq5su0nH=yCUywRy8Y5mg@mail.gmail.com>
	<CAMUF1SksOYOB92YnkTR5t+WUMYyidsa9nXbdTn3SDBu=H3z-kA@mail.gmail.com>
	<F8922B51-659A-4EDB-8D7F-BF6BA84EEB3D@oracle.com>
	<CA+F8eeRX5Lqx_bMsekHBB=MmxpN8HYuMA9sGtTaiW2k6G9QYrw@mail.gmail.com>
	<7673857C-B407-489F-AB19-7B5317AB9AE7@oracle.com>
Message-ID: <CA+F8eeQ5aujW=sj-4dM5neCRAEVf8LW_xqw5C2RPg=C-SOj44Q@mail.gmail.com>

On Tue, Feb 26, 2013 at 7:49 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:

> Keep Optional, and keep it lean and mean. (At least as lean as the Guava
> Optional.)
>
> I notice Guava's Optional has a transform method:
>

Right -- "at *least* as lean". But leaner is better. I haven't used the
transform method. It feels like the top of the slippery slope that leads to
monstrosities like Map<String, Optional<Integer>>.


> One of the reasons Doug Lea is not an Optional fan is fear of things like
> Collection<Optional<T>>, something that is less likely to happen with a
> minimalist Optional.
>
> Especially so if we make it more difficult by removing the hashcode and
> equals methods.
>

Yes! With docs explaining why.

--tim

From paul.sandoz at oracle.com  Tue Feb 26 05:11:50 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 26 Feb 2013 14:11:50 +0100
Subject: Code review request
In-Reply-To: <512BDB8B.9070005@oracle.com>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
	<458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
	<512BDB8B.9070005@oracle.com>
Message-ID: <958F39D3-EB3A-4FD2-ABAF-5109BDDD8D49@oracle.com>


On Feb 25, 2013, at 10:45 PM, David Holmes <david.holmes at oracle.com> wrote:

> On 26/02/2013 3:31 AM, Paul Sandoz wrote:
>> Hi Remi,
>> 
>> Thanks for the feedback i have addressed some of this, mostly related to
>> inner classes, in following change set to the lambda repo:
>> 
>> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea
> 
> I see a lot of private things that are now package-access.

I presume you mean on constructors of private inner classes?


> Is that because they are now being used within the package?
> 

No, it is to avoid the creation of a synthetic package private constructor called by enclosing class to construct the inner class.


> The access modifiers document intended usage even if there is limited accessibility to the class defining the member. The idea that a class restricted to package-access should have member access modifiers restricted to package-only or else private, is just plain wrong in my view. Each type should have a public, protected and private API. The exposure of the type within a package is a separate matter. Package-access then becomes a limited-sharing mechanism.
> 

For private inner classes i took the view that protected on fields offered little value, but paused for top level classes.

There are not many use-cases in the JDK at least for the packages i browsed. The class java.util.concurrent.atomic.Striped64 does not bother with protected.

I am leaning towards the opinion that protected is just noise in these cases since the compiler offers no protection.

Paul.

From forax at univ-mlv.fr  Tue Feb 26 05:40:37 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 26 Feb 2013 14:40:37 +0100
Subject: Code review request
In-Reply-To: <958F39D3-EB3A-4FD2-ABAF-5109BDDD8D49@oracle.com>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
	<458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
	<512BDB8B.9070005@oracle.com>
	<958F39D3-EB3A-4FD2-ABAF-5109BDDD8D49@oracle.com>
Message-ID: <512CBB55.7080301@univ-mlv.fr>

On 02/26/2013 02:11 PM, Paul Sandoz wrote:
> On Feb 25, 2013, at 10:45 PM, David Holmes <david.holmes at oracle.com> wrote:
>
>> On 26/02/2013 3:31 AM, Paul Sandoz wrote:
>>> Hi Remi,
>>>
>>> Thanks for the feedback i have addressed some of this, mostly related to
>>> inner classes, in following change set to the lambda repo:
>>>
>>> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea
>> I see a lot of private things that are now package-access.
> I presume you mean on constructors of private inner classes?
>
>
>> Is that because they are now being used within the package?
>>
> No, it is to avoid the creation of a synthetic package private constructor called by enclosing class to construct the inner class.
>
>
>> The access modifiers document intended usage even if there is limited accessibility to the class defining the member. The idea that a class restricted to package-access should have member access modifiers restricted to package-only or else private, is just plain wrong in my view. Each type should have a public, protected and private API. The exposure of the type within a package is a separate matter. Package-access then becomes a limited-sharing mechanism.
>>
> For private inner classes i took the view that protected on fields offered little value, but paused for top level classes.
>
> There are not many use-cases in the JDK at least for the packages i browsed. The class java.util.concurrent.atomic.Striped64 does not bother with protected.
>
> I am leaning towards the opinion that protected is just noise in these cases since the compiler offers no protection.

amen :)

>
> Paul.

R?mi


From forax at univ-mlv.fr  Tue Feb 26 05:50:11 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 26 Feb 2013 14:50:11 +0100
Subject: Code review request
In-Reply-To: <458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
	<458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
Message-ID: <512CBD93.1080805@univ-mlv.fr>

On 02/25/2013 06:31 PM, Paul Sandoz wrote:
> Hi Remi,
>
> Thanks for the feedback i have addressed some of this, mostly related to inner classes, in following change set to the lambda repo:
>
> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea
>
> We can update the webrev next week.

<nitpicking mode="on">

There are still some methods that are declared 'default public' and some 
that are declared just with 'default'.

I propose the following code convention for abstract/default method in 
interface.
All methods in interface are marked public (just because we may support 
private static method in jdk9),
default method should be 'public default' and not 'default public', like 
we have public static, visibility modifier first,
and abstract methods in the same interface should be declared only 'public'.

R?mi
...


From paul.sandoz at oracle.com  Tue Feb 26 07:41:14 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 26 Feb 2013 16:41:14 +0100
Subject: Code review request
In-Reply-To: <512CBD93.1080805@univ-mlv.fr>
References: <512672E6.1050708@oracle.com> <51289F3D.1010609@univ-mlv.fr>
	<458B2D62-336C-429C-B835-DEEC7031004B@oracle.com>
	<512CBD93.1080805@univ-mlv.fr>
Message-ID: <20D89B20-0604-45E9-AAA7-38F74298C77A@oracle.com>


On Feb 26, 2013, at 2:50 PM, Remi Forax <forax at univ-mlv.fr> wrote:

> On 02/25/2013 06:31 PM, Paul Sandoz wrote:
>> Hi Remi,
>> 
>> Thanks for the feedback i have addressed some of this, mostly related to inner classes, in following change set to the lambda repo:
>> 
>> http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3e50294c68ea
>> 
>> We can update the webrev next week.
> 
> <nitpicking mode="on">
> 
> There are still some methods that are declared 'default public'

Where?


> and some that are declared just with 'default'.
> 

> I propose the following code convention for abstract/default method in interface.
> All methods in interface are marked public (just because we may support private static method in jdk9),
> default method should be 'public default' and not 'default public', like we have public static, visibility modifier first,
> and abstract methods in the same interface should be declared only 'public'.
> 

I do not relish your proposal of changing all abstract methods in interfaces to be declared redundantly public because of potential future features, even if such features are highly likely, we should have that discussion when those feature arrive.

The source in the java.util.function package uses "public default" for default methods. That source has been through a round of reviews strongly indicating this was the preferred approach. Mike, is that so?

Paul.

From mike.duigou at oracle.com  Tue Feb 26 08:56:00 2013
From: mike.duigou at oracle.com (Mike Duigou)
Date: Tue, 26 Feb 2013 08:56:00 -0800
Subject: Where has the map method on Optional moved?
In-Reply-To: <CA+F8eeQ5aujW=sj-4dM5neCRAEVf8LW_xqw5C2RPg=C-SOj44Q@mail.gmail.com>
References: <CAD2aR4OeHOG+E9=mgZRZo2kM5TktjqTrv4ZuLiCsJKWnNHorQQ@mail.gmail.com>
	<5C64FDEE-13CC-4256-A09E-82614ACD4ADB@oracle.com>
	<CAN3nywD33uRJL545Np=NVmUbGQvF_Jq5su0nH=yCUywRy8Y5mg@mail.gmail.com>
	<CAMUF1SksOYOB92YnkTR5t+WUMYyidsa9nXbdTn3SDBu=H3z-kA@mail.gmail.com>
	<F8922B51-659A-4EDB-8D7F-BF6BA84EEB3D@oracle.com>
	<CA+F8eeRX5Lqx_bMsekHBB=MmxpN8HYuMA9sGtTaiW2k6G9QYrw@mail.gmail.com>
	<7673857C-B407-489F-AB19-7B5317AB9AE7@oracle.com>
	<CA+F8eeQ5aujW=sj-4dM5neCRAEVf8LW_xqw5C2RPg=C-SOj44Q@mail.gmail.com>
Message-ID: <5A2AA78F-5A1B-4117-84AF-6E87A5B1D396@oracle.com>


On Feb 26 2013, at 05:02 , Tim Peierls wrote:

> On Tue, Feb 26, 2013 at 7:49 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> Keep Optional, and keep it lean and mean. (At least as lean as the Guava Optional.)
> I notice Guava's Optional has a transform method:
> 
> Right -- "at least as lean". But leaner is better. I haven't used the transform method. It feels like the top of the slippery slope that leads to monstrosities like Map<String, Optional<Integer>>.
> 
>  
>> One of the reasons Doug Lea is not an Optional fan is fear of things like Collection<Optional<T>>, something that is less likely to happen with a minimalist Optional.
> Especially so if we make it more difficult by removing the hashcode and equals methods.
> 
> Yes! With docs explaining why.

I am working on this now. The methods won't be removed but will support only the identity hashCode()/equals().

> 
> --tim
>  


From daniel.smith at oracle.com  Tue Feb 26 13:47:10 2013
From: daniel.smith at oracle.com (Dan Smith)
Date: Tue, 26 Feb 2013 14:47:10 -0700
Subject: flatMap ambiguity
Message-ID: <D678A806-DCC7-46A0-A419-15E58BB5A8A4@oracle.com>

A thread on lambda-dev highlighted a problem with the overloading of flatMap:

<R> Stream<R> flatMap(FlatMapper<? super T, R> mapper)
 IntStream flatMap(FlatMapper.ToInt<? super T> mapper)
LongStream flatMap(FlatMapper.ToLong<? super T> mapper)
DoubleStream flatMap(FlatMapper.ToDouble<? super T> mapper)

These functional interfaces have corresponding descriptors:

(T, Consumer<R>)->void (R is inferred)
(T, IntConsumer)->void
(T, LongConsumer)->void
(T, DoubleConsumer)->void

This violates the general rule that overloading with functional interfaces of the same shape shouldn't use different parameter types.  Various ambiguities result:

- "(x, sink) -> sink.accept(10)" is compatible with all the primitive consumers, and we have no way to disambiguate (the new most-specific rules handle this sort of thing with return types, but are not designed to decide which type is "better" for an arbitrary block of code).

- "(x, sink) -> sink.accept(22.0)" is compatible with "(T, DoubleConsumer)->void", and leads "(T, Consumer<R>)->void" to be provisionally applicable -- we don't check the body at all in that case, until we've had a chance to look at the target type of the 'flatMap' invocation and figure out what R is supposed to be.  So, again, an ambiguity will occur.

It would probably be best to give the primitive versions distinct names.

---

Note also that an invocation like the following will always produce a Stream<Object>:

stream.flatMap((x, sink) -> sink.accept("x")).filter(...)....

Inference is forced to resolve R without knowing anything about it, and so it must go with the default "Object".

The only way to get useful information about R is to derive bounds from the body of the lambda, and that's simply not something we can do in general.

I don't know what to recommend in this case, except perhaps that this method inherently depends on some explicit typing (e.g., "(String x, Consumer<String> sink) -> ...").

?Dan

From maurizio.cimadamore at oracle.com  Tue Feb 26 14:54:20 2013
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Tue, 26 Feb 2013 22:54:20 +0000
Subject: flatMap ambiguity
In-Reply-To: <D678A806-DCC7-46A0-A419-15E58BB5A8A4@oracle.com>
References: <D678A806-DCC7-46A0-A419-15E58BB5A8A4@oracle.com>
Message-ID: <512D3D1C.2040508@oracle.com>

On 26/02/13 21:47, Dan Smith wrote:
> Note also that an invocation like the following will always produce a Stream<Object>:
>
> stream.flatMap((x, sink) -> sink.accept("x")).filter(...)....
Mostly unrelated: the current implementation will actually give up in 
such cases, issuing a 'cyclic inference' error message to give 
opportunity to add more type info.

Maurizio