Initial prototype for BiStream
Brian Goetz
brian.goetz at oracle.com
Sat Apr 21 10:51:02 PDT 2012
>> Hello all;
>>
>> I've committed an initial prototype for Bi-Value streams aka MapStream.
>
> (It would have been helpful to give the hint that these are
> currently hiding inside java.lang.Iterable!)
Yes, that's one of the "open issues" as mentioned in the latest SotLL
post -- where is the best place to attach the stream functionality to
the existing libraries, and what is the ideal form for a stream to
acquire its elements. Itera{tor,ble} are problematic for many reasons;
consider the current implementation an expedient hack to get to
something we can experiment with while we figure out the answers to
these questions.
> I think the left/right terminology in Bi forms is a little odd.
> Too odd.
Yes, odd. Another of the "open issues" is whether the two-valued stream
should be considered a stream of pair objects (boxing issues), an
undifferentiated two-valued stream (which requires addressing the above
question: what does a bi-valued iterator look like?), or a stream of
key-value pairs where the keys are unique. There is room for each of
these (Maps already box into Map.Entry pairs; the result of a zip
operation is a stream of unrelated pairs; many other operations result
in key-value streams that have unique keys (e.g., groupBy). But we
probably don't want to have Stream<Pair<X,Y>> AND BiStream<X,Y> and
MapStream<X,Y>.
The left/right terminology is stake-in-the-ground attempt to explore
"what if we just have a bi-valued stream and fit the others into that
model."
> Plus there's no obvious place to go for Tri forms, and so on. "up"?
Yes, at some point we will find ourselves up a stream without a paddle :)
Realistically, we have a limited namespace budget for bridging the
nominal-structural divide. Without tuple types and varargs generics
(neither is coming in 8), what we provide here will necessarily be an
approximation of all the use cases we want to cover. (Remember, we've
also got primitive specializations on the "to be considered" list.)
> More common would be fst/snd. Or I suppose, first/second.
Or key/value.
> Yeah. Packing/boxing on each method call is not going to win any friends
> (especially when the call itself is itself likely to box/pack/unbock/etc).
> The history of JDK classes that initially assumed that this was
> not going to be a serious problem is not a pretty one. (For a random
> old example, someone's initial version on WeakHashMap that required
> construction of a Weak ref on each get() was too awful to
> ever actually use.) Hoping that it will somehow magically come out
> differently this time doesn't seem too appealing.
Agreed. This is why I would like to keep the Pair interface out of the
API almost completely, if we can. And we clearly need to address the
issue of map-to-primitive / reduce-over-primitive without boxing.
> On the other hand, if these are carefully enough done, and
> if first-rate tuple/struct support ever does make it into Java,
> then in the long run, using tuples here could be the most
> efficient way to do it.
Predicting the future is hard; making decisions about how much to
sacrifice in the present in order to blend well with what might happen
in the future is really hard. If we had tuples today, this discussion
would have been over before it started. If we knew tuples were coming
tomorrow, it would be almost as short. Given that all we know is that
it would be really nice to have tuples someday, its not clear how much
we should distort the current design to support that happy future. I
think our best bet is to avoid using tuple-like constructs in the APIs
(since they suck now), and thereby minimize the potential conflict with
any future tuple design. (If Java 10 gives us fast immutable tuples
with a fast conversion to a nominal tuple-like representation, then
perhaps the worst aspect would be that the things we do to work around
their lack -- the BiXxx abstractions and the primitive specializations
-- just become historical warts when Stream<Tuple<int,int>> becomes the
natural way to express the zip of two int-streams. That doesn't sound
worse than the alternatives of "do nothing now" or "do something that
sucks now".)
> So what do you with a decision that is in the short run a
> BIG performance loss but in the long run is possibly a small
> performance win, and in some people's minds, a nicer API?
Not clear that the pairs approach leads to a nicer API. Take Map;
people like Map. We could have said "Map extends
Collection<Entry<K,V>>", but we didn't; the only place where the Pair
concept pokes up its ugly head is in entrySet. I think the win here, in
the world as it now stands, is to continue down that path: there are two
shapes of streams (linear and key-value), just as there have been two
shapes of collections.
I think what we lose in the short run is "native" support for tri-valued
streams, and while the loss of generality is inelegant, still provides a
practical path for people to get most real problems solved with
elegant-looking code.
In reality I think one of the hardest parts of the current effort will
be finding the right names for all the variants of the same base type
and operation (e.g., BiPredicate, IntPredicate, IntStream, etc.) First,
of course, we need to settle on the core shapes and operations.
> The best answer might be a variant of my usual stance on all
> this stuff: that it is OK to let you folks continue
> to find the most defenseible APIs for the FP-friendly side of this,
> and all will end up well if I also put into place highly efficient
> pre-loop-fused, possibly-mutative, etc., direct methods in
> ConcurrentHashMap and BetterArrayList, in part as a safeguard
> to avert disaster. Maybe someday compilers/JVMs will figure out to
> translate into these or even better forms. But in the mean time there
> will still be a teachable path for users who need/want it.
Indeed. We have banked heavily on you being right here :)
More information about the lambda-dev
mailing list