Streams design strawman

Thu Apr 26 01:09:22 PDT 2012

A couple of years ago I wrote my own parallel functional collections
library and in many ways it is similar to the strawman, since the libraries
are similar maybe some of my practical experience will be of interest.
Below I have made some brief comments into
three categories similarities/differences between strawman and mine, user
programmable short cut functions, simplifications, bi-streams, and
more radical proposals. I have written below as a series of short notes,
happy to expand the notes if people are interested.

   1. Similarities/Differences Between Strawman and My Libraries
      1. Similarities
         1. Compatible with existing Collections
         2. Can copy in from an existing collection
         3. Can copy out into a collection
         4. Mixture of lazy and eager operations, split is similar. I use
         List's methods, reduce, copy, and clone as eager and rest as lazy
      2. Differences
         1. I use a tuple library, your choice of a map probably better
         (see 4.1 below)
         2. My terminology is different, I have used your terminology in
         this email
         3. I don't support nulls, I find this really useful since I don't
         have to check for null in Mappers etc.
         4. My ParallelList extends AbstractSequentialList and implements
         Cloneable, List operations are eager and immutable. This is
useful since in
         general you don't need to copy out into a List for an old
API, since you
         already have a List. Concurrent modifications via an second eager
         evaluation fail fast.
         5. I copy into ParallelList using an Iterator, Iterable,
         Collection, array, or value and skip nulls, I also use use
the idioms of a
         new ParallelList(seed).flatMap(mapper) and
         ParallelList.range(from,to).map(mapper) to populate the
ParallelList (also
         see 2.7 below). I find it helpful to copy in since my internal
         representation is nothing like any of the existing
collections and it also
         means I don't have to watch for mutations.
         6. My Mapper etc, get passed the index as well as the value (I
         find this really useful!). One of the reasons why I don't
find sequential
         versions worth while (see 2.8 below).
         7. I have a static range method that creates a new ParallelList
         8. I only have parallel versions, I find the overhead of parallel
         small and since I pass the index (2.6) I don't have much use
for sequential
         versions
         9. I have static factories called in that mimic constructors but
         return a pre-made empty list rather than a new empty list as
a constructor
         would,
      2. User Programmable Short Cut Functions
      1. You can break my parallel loop by throwing a pre-made
      BreakException
      2. You can continue my parallel loop by throwing a pre-made
      ContinueException or returning null
      3. Break and continue are useful for translating existing loops into
      a ParallelList
      4. Break and continue are useful for short cut operators, e.g. Or and
      And
   3. Simplifications (reduction in API 'surface area")
      1. The 'surface area' of my API is small since the only public class
      is ParallelList and it has constructors and factories for Iterable and
      Iterator, therefore you could reduce your API to defender methods called
      parallel in Iterable and Iterator and classes ParallelList and
ParallelMap
      (aslo see point 5.1 below)
   4. Bi-Streams
      1. I use a tuple library, but the Bi-Stream idea is interesting and
      well worth pursuing since I have noticed some overhead with tuples
   5. More Radical Proposals
      1. Instead of basing everything on ParallelList<V> you could base
      everything on ParallelMap<K,V>:

class ParallelMap<K,V> extends AbstractMap<K,V> implements Cloneable,
Serializable {

  // lazy parallel operations

  // eager reduce and clone operations

  // Map operations are all eager, only immutable operations supported

}

class ParallelIndexed<V> extends ParallelMap<Long,V> implements
NavigableMap<Long,V> {

  V get(long index);

  // other get like methods with long index, but not put like methods
(immutable)

  // NavigableMap and hence SortedMap operations are all eager, only
immutable operations supported
  // indices must be continuous from firstKey to lastKey

}

class ParallelGroup<K,Vs> extends ParallelMap<K,ParallelIndexed<Vs>> { ... }

This reduces the API 'surface area' further, since you now only have maps
that are sub-classes of ParallelMap. This technique fits well with groupBy
since it would return a ParallelGroup<U,V>.

Hope you find these comments useful.

 -- Howard.

On 19 April 2012 12:49, Brian Goetz <brian.goetz at oracle.com> wrote:

> I've posted a strawman writeup of the key concepts we're thinking about
> for the libraries at:
>
>   http://cr.openjdk.java.net/~briangoetz/lambda/collections-overview.html
>
> The design is still rough, and we've started checking in an even rougher
> implementation into the lambda repository.  There are lots of omissions,
> errors, and open questions.  But it is at a point where we've got a
> consistent enough strawman for people to start playing with it, and
> share their experience.
>
> As the document suggests, there are a large number of open issues, and
> we've temporarily ignored these for the purposes of pulling together
> something that captures the basic functionality.  We've deliberately
> deferred addressing many sticky issues including attachment to
> collections, primitive specialization, checked exceptions, infinite
> streams, IO-backed streams, nulls, and of course, naming.
>
> These are all sticky issues, and we'll get to them, but for now, I'd
> like to leave them be for a little while longer, so that we can shake
> out the key functionality first.
>
> What I am asking of the community is: TRY OUT THE CODE.  *The single
> most valuable thing that the community can do to help move this effort
> forward, and improve the quality of the result, is to try out the code
> early and often.*  (A single practical experience report is worth many
> armchair observations and suggestions!)
>
> What's in the lambda repo right now is still a little rough, but within
> the next few days it should be cleaned up enough for people to be able
> to try it out on their code.  We would love it if people could try to
> lambdify some small bit of their code base, and report back on how it
> worked -- whether it worked great, or not.  Also valuable is where you
> got stuck, such as "I had trouble converting this pattern to use the new
> libraries."
>
>
>
>
>
>

-- 
  -- Howard.