Primitive streams and optional

Sat Nov 24 09:13:14 PST 2012

Just in case anyone is interested in re-deciding some basics In light
of the continuing saga of unappealing API choices, here's one last
push for adopting the j.u.c null policies in streams.

Sorry that I can't think of a good way to present this without
stepping back into prehistory!

Long ago (1950s), people noticed that there are two basic flavors of
data: values and pointers. A value is just, um, a value. A pointer
differs conceptually in that it might not point to anything. Hence the
invention of null, as a special state of a pointer, that for economy,
is encoded as the special value zero if null, else a (possibly
virtualized etc) memory address. (One disadvantage of this encoding is
that it loses type information -- an early form of "erasure". A null
pointer to an int looks the same as a null pointer to a double, etc.)

Only slightly less long ago (late 1960s), people noticed that
pointer-like notions could be elevated to the idea of "references to
objects" (in early forms, an object's pointer address was its
identity). But still with the notion that a reference might not point
anywhere.

So, now we have four different concepts:
1. values
2. possibly null pointers to values
3. objects
4. possibly null references to objects

The possibly-null case naturally occurs with partial functions and
methods, often related to lookup/search: get the thing at some
uninitialized array position, or in a hash map without a binding,
etc. Also for terminals in linked data structures.  You need some way
to say that there is no such thing there.

The FP (and ADT) folks had an arguably easier time of this, since they
only encountered cases (1) and (2). Still they had the notion of a
compound-value, which is like an object, but has no defined identity.
Any partial function that "should" return an X but need not can
instead return an Optional<X>. And the most common technique for
implementing this notion is to "box" the value when present, else
return null. The programmer is never never exposed to this though.
For example, using "==" on a boxed vs unboxed int does the same thing
(comparing values, not the "invisible" pointers).

The pure OO (smalltalk etc) folks also in principle had an easier
time, since they conceptually dealt only with cases (3) and (4).
Since everything is an object, everything worked uniformly. (Although
many people now think in retrospect that "nullable" should have a
required part of any method return type spec so that programmers know
when nulls might legitimately vs accidentally appear. JSR308 might
help with this though.)  However people don't appreciate it when "=="
always compares pointers (among other issues) for integers, so special
rules were made for these cases, that are basically the inverse of the
FP approach.  That is, in FP, pointerness is hidden, in OO,
pointerless-valueness is hidden. But less hidden. for example Integers
are objects with identity, monitors, etc, (and so are unlike
"Optional<int>" if such a thing existed) and you can readily tell if
you have an Integer vs an int. On the other hand, you can still use
ints as (autoboxed) objects inside collections etc without needing to
have a special implementation just for ints (at the price of
now-famous space bloats).

Any language/library that embraces both of these notions together has
to do something that is not identical to either pure FP or OO
approaches. Some languages get a foothold by distinguishing object
types from value types. Thus, nullness applies to objects,
optionalness applies to values.  So, Scala, Lime, etc have variants
of:

1. value types: int, double etc
2. Optional<V>: the result of partial functions on value types
3. object types (Object and subclasses)
4. refs: possibly null references to objects

We don't have this foothold.  Arguably, because of this, we should not
be creating such frameworks. Be we are.

So the choices are:

A. Pretend we have value types. Introduce Optional for use with any
value-like things, along with some set of conventions about how they
interact with objects and possibly null refs.

B. Don't pretend we have value types unless/until we have them.  Use
the standard OO conventions, in which boxing classes like Integers are
used when you need to elevate a value to objecthood. And when you have
one, you have a full-fledged object, not just an invisible pointer.
And when you don't have one, you just have null.

Choice A is tempting because of its familiarity by programmers with
FP background. But doing so forces a never-ending set of bandaids
(as we've seen lately) because none of the rules for interoperating
with Object conventions make much sense.

Sticking with (B) is less tempting to some people not only because
they like to think of some of their classes in value-like ways, but
also because streams (like java.util.concurrent) would need to
relentlessly maintain the "null means nothing there" policy. So,
emptyStream.reduce(f) must return null, null elements appearing in
streams must be skipped, etc. But not only is this the most defensible
policy to use in the absence of true value types, it is best suited to
kludgelessly evolve to embrace value types if they are ever supported.

There is also a choice C: always throw exceptions for partial
functions / nothing-there cases. The logic of this is fine, and
completely reasonable is when nothing-there-ness is accidental or
exceptional. But the world voted against the painfulness and
inefficiency of everyday programming under this encoding of
nothing-there decades ago.

Summary: get rid of Optional. Use null consistently to mean nothing
there (plus exceptions in exceptional cases). Use the standard boxed
types for numerics. Until/unless there is are value types, create
intStream etc as a separate set of classes with merely analogous APIs.
(And while we are at it, add LongKeyHashMap and a few others!)
Don't worry about people who used null as "meaningful" elements, map
keys or map values.  No one is forcing them to use streams.

-Doug