Initial prototype for BiStream

Brian Goetz brian.goetz at oracle.com
Sat Apr 21 10:51:02 PDT 2012


>> Hello all;
>>
>> I've committed an initial prototype for Bi-Value streams aka MapStream.
>
> (It would have been helpful to give the hint that these are
> currently hiding inside java.lang.Iterable!)

Yes, that's one of the "open issues" as mentioned in the latest SotLL 
post -- where is the best place to attach the stream functionality to 
the existing libraries, and what is the ideal form for a stream to 
acquire its elements.  Itera{tor,ble} are problematic for many reasons; 
consider the current implementation an expedient hack to get to 
something we can experiment with while we figure out the answers to 
these questions.

> I think the left/right terminology in Bi forms is a little odd.
> Too odd.

Yes, odd.  Another of the "open issues" is whether the two-valued stream 
should be considered a stream of pair objects (boxing issues), an 
undifferentiated two-valued stream (which requires addressing the above 
question: what does a bi-valued iterator look like?), or a stream of 
key-value pairs where the keys are unique.  There is room for each of 
these (Maps already box into Map.Entry pairs; the result of a zip 
operation is a stream of unrelated pairs; many other operations result 
in key-value streams that have unique keys (e.g., groupBy).  But we 
probably don't want to have Stream<Pair<X,Y>> AND BiStream<X,Y> and 
MapStream<X,Y>.

The left/right terminology is stake-in-the-ground attempt to explore 
"what if we just have a bi-valued stream and fit the others into that 
model."

> Plus there's no obvious place to go for Tri forms, and so on. "up"?

Yes, at some point we will find ourselves up a stream without a paddle :)

Realistically, we have a limited namespace budget for bridging the 
nominal-structural divide.  Without tuple types and varargs generics 
(neither is coming in 8), what we provide here will necessarily be an 
approximation of all the use cases we want to cover.  (Remember, we've 
also got primitive specializations on the "to be considered" list.)

> More common would be fst/snd. Or I suppose, first/second.

Or key/value.

> Yeah. Packing/boxing on each method call is not going to win any friends
> (especially when the call itself is itself likely to box/pack/unbock/etc).
> The history of JDK classes that initially assumed that this was
> not going to be a serious problem is not a pretty one. (For a random
> old example, someone's initial version on WeakHashMap that required
> construction of a Weak ref on each get() was too awful to
> ever actually use.) Hoping that it will somehow magically come out
> differently this time doesn't seem too appealing.

Agreed.  This is why I would like to keep the Pair interface out of the 
API almost completely, if we can.  And we clearly need to address the 
issue of map-to-primitive / reduce-over-primitive without boxing.

> On the other hand, if these are carefully enough done, and
> if first-rate tuple/struct support ever does make it into Java,
> then in the long run, using tuples here could be the most
> efficient way to do it.

Predicting the future is hard; making decisions about how much to 
sacrifice in the present in order to blend well with what might happen 
in the future is really hard.  If we had tuples today, this discussion 
would have been over before it started.  If we knew tuples were coming 
tomorrow, it would be almost as short.  Given that all we know is that 
it would be really nice to have tuples someday, its not clear how much 
we should distort the current design to support that happy future.  I 
think our best bet is to avoid using tuple-like constructs in the APIs 
(since they suck now), and thereby minimize the potential conflict with 
any future tuple design.  (If Java 10 gives us fast immutable tuples 
with a fast conversion to a nominal tuple-like representation, then 
perhaps the worst aspect would be that the things we do to work around 
their lack -- the BiXxx abstractions and the primitive specializations 
-- just become historical warts when Stream<Tuple<int,int>> becomes the 
natural way to express the zip of two int-streams.  That doesn't sound 
worse than the alternatives of "do nothing now" or "do something that 
sucks now".)

> So what do you with a decision that is in the short run a
> BIG performance loss but in the long run is possibly a small
> performance win, and in some people's minds, a nicer API?

Not clear that the pairs approach leads to a nicer API.  Take Map; 
people like Map.  We could have said "Map extends 
Collection<Entry<K,V>>", but we didn't; the only place where the Pair 
concept pokes up its ugly head is in entrySet.  I think the win here, in 
the world as it now stands, is to continue down that path: there are two 
shapes of streams (linear and key-value), just as there have been two 
shapes of collections.

I think what we lose in the short run is "native" support for tri-valued 
streams, and while the loss of generality is inelegant, still provides a 
practical path for people to get most real problems solved with 
elegant-looking code.

In reality I think one of the hardest parts of the current effort will 
be finding the right names for all the variants of the same base type 
and operation (e.g., BiPredicate, IntPredicate, IntStream, etc.)  First, 
of course, we need to settle on the core shapes and operations.

> The best answer might be a variant of my usual stance on all
> this stuff: that it is OK to let you folks continue
> to find the most defenseible APIs for the FP-friendly side of this,
> and all will end up well if I also put into place highly efficient
> pre-loop-fused, possibly-mutative, etc., direct methods in
> ConcurrentHashMap and BetterArrayList, in part as a safeguard
> to avert disaster. Maybe someday compilers/JVMs will figure out to
> translate into these or even better forms. But in the mean time there
> will still be a teachable path  for users who need/want it.

Indeed.  We have banked heavily on you being right here :)







More information about the lambda-dev mailing list