Primitive streams and optional

Brian Goetz brian.goetz at oracle.com
Mon Nov 26 12:43:09 PST 2012


Note that all of these concerns about nulls are mostly a matter of 
choosing which null anomalies we are least uncomfortable with, since 
there is no choice that is fully anomaly-free.

Additionally, there are two sides to consider:
  - What should we do with null values when processing streams;
  - How do we represent the APIs of pipelines which are essentially 
partial functions on streams (e.g., findFirst on an empty stream has no 
result.)

In the thread rooted at 
http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2012-September/000158.html, 
we looked at the first question, and came up with four buckets:

1.  Ban nulls.  This is equivalent to adding
    .tee(e -> { if (e == null) throw new NPE(); }
between all stages of a pipeline.

2.  Ignore nulls.  This is what Doug is proposing, and is equivalent to 
adding
    .filter(e -> e != null)
between all stages of a pipeline.

3.  Tolerate nulls.  This treat nulls as "just another value" and hopes 
that lambdas and downstream stages can deal.

4.  Embrace nulls as valid values.  Ensure that every operation can deal 
with nulls in a well-defined manner.  (This entails, for example, either 
dropping the Optional-bearing methods or making present Optional deal 
with null.)

Note that this problem doesn't completely go away if we ban nulls from 
collections; you can still have partial functions passed to map(), which 
could inject nulls into the downstream stages.  So even if we'd taken 
the path of "nulls not allowed in Collections" 15 years ago, we'd still 
be answering this question (though we might come to a different answer.)


The other part of the question is what to do at the tail end of the 
pipeline for operations such as reduction or search which might not 
yield anything.  This consideration is not completely orthogonal to the 
first, but is also not fully constrained either.  While introducing an 
Optional box seems natural in the context of (3), we don't necessarily 
have to do that; we could stay with (3) but get rid of Optional and 
instead embrace the Map.get anomaly and use null to indicate "no value", 
leaving users to wonder "did it find nothing, or find a null?"  (And 
we'd have to do something even uglier for primitives.)

As I said, I think this effort is largely a search for the least 
undesirable null anomalies, candidates include:
  - The map.get() anomaly
  - Lack of size/index preservation
  - Excessive null intolerance (NPEs when the operation could actually 
have completed)
  - Null interference (removing nulls that the user actually wanted)
  - Excessive boxing
  - More complex reasoning about when NPE might occur
  - ...

I would suspect everyone has a different ordering of which anomalies 
they find most or least bothersome.

The current implementation takes the tack of (3) plus Optional.  Doug is 
arguing for (2) and no Optional.  (No one really liked (4), and no one 
really thought we could get away with (1).)  (3) and no Optional is also 
a possibility.


On 11/24/2012 12:13 PM, Doug Lea wrote:
>
> Just in case anyone is interested in re-deciding some basics In light
> of the continuing saga of unappealing API choices, here's one last
> push for adopting the j.u.c null policies in streams.
>
> Sorry that I can't think of a good way to present this without
> stepping back into prehistory!
>
> Long ago (1950s), people noticed that there are two basic flavors of
> data: values and pointers. A value is just, um, a value. A pointer
> differs conceptually in that it might not point to anything. Hence the
> invention of null, as a special state of a pointer, that for economy,
> is encoded as the special value zero if null, else a (possibly
> virtualized etc) memory address. (One disadvantage of this encoding is
> that it loses type information -- an early form of "erasure". A null
> pointer to an int looks the same as a null pointer to a double, etc.)
>
> Only slightly less long ago (late 1960s), people noticed that
> pointer-like notions could be elevated to the idea of "references to
> objects" (in early forms, an object's pointer address was its
> identity). But still with the notion that a reference might not point
> anywhere.
>
> So, now we have four different concepts:
> 1. values
> 2. possibly null pointers to values
> 3. objects
> 4. possibly null references to objects
>
> The possibly-null case naturally occurs with partial functions and
> methods, often related to lookup/search: get the thing at some
> uninitialized array position, or in a hash map without a binding,
> etc. Also for terminals in linked data structures.  You need some way
> to say that there is no such thing there.
>
> The FP (and ADT) folks had an arguably easier time of this, since they
> only encountered cases (1) and (2). Still they had the notion of a
> compound-value, which is like an object, but has no defined identity.
> Any partial function that "should" return an X but need not can
> instead return an Optional<X>. And the most common technique for
> implementing this notion is to "box" the value when present, else
> return null. The programmer is never never exposed to this though.
> For example, using "==" on a boxed vs unboxed int does the same thing
> (comparing values, not the "invisible" pointers).
>
> The pure OO (smalltalk etc) folks also in principle had an easier
> time, since they conceptually dealt only with cases (3) and (4).
> Since everything is an object, everything worked uniformly. (Although
> many people now think in retrospect that "nullable" should have a
> required part of any method return type spec so that programmers know
> when nulls might legitimately vs accidentally appear. JSR308 might
> help with this though.)  However people don't appreciate it when "=="
> always compares pointers (among other issues) for integers, so special
> rules were made for these cases, that are basically the inverse of the
> FP approach.  That is, in FP, pointerness is hidden, in OO,
> pointerless-valueness is hidden. But less hidden. for example Integers
> are objects with identity, monitors, etc, (and so are unlike
> "Optional<int>" if such a thing existed) and you can readily tell if
> you have an Integer vs an int. On the other hand, you can still use
> ints as (autoboxed) objects inside collections etc without needing to
> have a special implementation just for ints (at the price of
> now-famous space bloats).
>
> Any language/library that embraces both of these notions together has
> to do something that is not identical to either pure FP or OO
> approaches. Some languages get a foothold by distinguishing object
> types from value types. Thus, nullness applies to objects,
> optionalness applies to values.  So, Scala, Lime, etc have variants
> of:
>
> 1. value types: int, double etc
> 2. Optional<V>: the result of partial functions on value types
> 3. object types (Object and subclasses)
> 4. refs: possibly null references to objects
>
>
> We don't have this foothold.  Arguably, because of this, we should not
> be creating such frameworks. Be we are.
>
> So the choices are:
>
> A. Pretend we have value types. Introduce Optional for use with any
> value-like things, along with some set of conventions about how they
> interact with objects and possibly null refs.
>
> B. Don't pretend we have value types unless/until we have them.  Use
> the standard OO conventions, in which boxing classes like Integers are
> used when you need to elevate a value to objecthood. And when you have
> one, you have a full-fledged object, not just an invisible pointer.
> And when you don't have one, you just have null.
>
> Choice A is tempting because of its familiarity by programmers with
> FP background. But doing so forces a never-ending set of bandaids
> (as we've seen lately) because none of the rules for interoperating
> with Object conventions make much sense.
>
> Sticking with (B) is less tempting to some people not only because
> they like to think of some of their classes in value-like ways, but
> also because streams (like java.util.concurrent) would need to
> relentlessly maintain the "null means nothing there" policy. So,
> emptyStream.reduce(f) must return null, null elements appearing in
> streams must be skipped, etc. But not only is this the most defensible
> policy to use in the absence of true value types, it is best suited to
> kludgelessly evolve to embrace value types if they are ever supported.
>
> There is also a choice C: always throw exceptions for partial
> functions / nothing-there cases. The logic of this is fine, and
> completely reasonable is when nothing-there-ness is accidental or
> exceptional. But the world voted against the painfulness and
> inefficiency of everyday programming under this encoding of
> nothing-there decades ago.
>
> Summary: get rid of Optional. Use null consistently to mean nothing
> there (plus exceptions in exceptional cases). Use the standard boxed
> types for numerics. Until/unless there is are value types, create
> intStream etc as a separate set of classes with merely analogous APIs.
> (And while we are at it, add LongKeyHashMap and a few others!)
> Don't worry about people who used null as "meaningful" elements, map
> keys or map values.  No one is forcing them to use streams.
>
> -Doug
>
>


More information about the lambda-libs-spec-observers mailing list