Primitive streams and optional

Sat Nov 24 11:52:05 PST 2012

Doug's suggestion of ignoring null gets my vote. I used the same approach in my own parallel library and it worked well. It also is consistent with null treatment in some languages, like Objective-C. 

The approach taken by Objective-C is instructive because like Java it is a mixed language with values and objects. However there is a caveat on looking at Objective-C for inspiration because in Objective-C (the equivalent of), null.method() does not throw a NPE instead it is a no-op. Therefore ignoring null is engrained throughout the language. 

But as I said, ignoring null worked well for me. In the PSs below there is more detail about how I handled null. 

-- Howard. 

PS In my parallelisation library I split the data into a doubly linked list of segments. Each segment is the same size and is padded with null as necessary. Therefore all my ops (the equivalent of): map, filter, reduce, etc. preserved the segment size and therefore make parallelisation easy. If after processing a segment is largely empty it may be merged with the segments on either side. If a segment is completely empty or not required after a subList operation it is dropped. 

PPS Returning null from a Mapper is the equivalent of a filter operation. In fact Filter 'is-a' Map that returns null or the value. 

PPPS I also allow throw BREAK and throw CONTINUE. Where BREAK and CONTINUE are pre-made static fields and therefore don't incur a creation overhead. Throwing CONTINUE is equivalent to returning null from a Mapper. Throwing BREAK in a Mapper terminates all the parallel operations, pads the remainder of the segment with null, and discards the subsequent segments. 

Sent from my iPad

On 25/11/2012, at 4:13 AM, Doug Lea <dl at cs.oswego.edu> wrote:

> 
> Just in case anyone is interested in re-deciding some basics In light
> of the continuing saga of unappealing API choices, here's one last
> push for adopting the j.u.c null policies in streams.
> 
> Sorry that I can't think of a good way to present this without
> stepping back into prehistory!
> 
> Long ago (1950s), people noticed that there are two basic flavors of
> data: values and pointers. A value is just, um, a value. A pointer
> differs conceptually in that it might not point to anything. Hence the
> invention of null, as a special state of a pointer, that for economy,
> is encoded as the special value zero if null, else a (possibly
> virtualized etc) memory address. (One disadvantage of this encoding is
> that it loses type information -- an early form of "erasure". A null
> pointer to an int looks the same as a null pointer to a double, etc.)
> 
> Only slightly less long ago (late 1960s), people noticed that
> pointer-like notions could be elevated to the idea of "references to
> objects" (in early forms, an object's pointer address was its
> identity). But still with the notion that a reference might not point
> anywhere.
> 
> So, now we have four different concepts:
> 1. values
> 2. possibly null pointers to values
> 3. objects
> 4. possibly null references to objects
> 
> The possibly-null case naturally occurs with partial functions and
> methods, often related to lookup/search: get the thing at some
> uninitialized array position, or in a hash map without a binding,
> etc. Also for terminals in linked data structures.  You need some way
> to say that there is no such thing there.
> 
> The FP (and ADT) folks had an arguably easier time of this, since they
> only encountered cases (1) and (2). Still they had the notion of a
> compound-value, which is like an object, but has no defined identity.
> Any partial function that "should" return an X but need not can
> instead return an Optional<X>. And the most common technique for
> implementing this notion is to "box" the value when present, else
> return null. The programmer is never never exposed to this though.
> For example, using "==" on a boxed vs unboxed int does the same thing
> (comparing values, not the "invisible" pointers).
> 
> The pure OO (smalltalk etc) folks also in principle had an easier
> time, since they conceptually dealt only with cases (3) and (4).
> Since everything is an object, everything worked uniformly. (Although
> many people now think in retrospect that "nullable" should have a
> required part of any method return type spec so that programmers know
> when nulls might legitimately vs accidentally appear. JSR308 might
> help with this though.)  However people don't appreciate it when "=="
> always compares pointers (among other issues) for integers, so special
> rules were made for these cases, that are basically the inverse of the
> FP approach.  That is, in FP, pointerness is hidden, in OO,
> pointerless-valueness is hidden. But less hidden. for example Integers
> are objects with identity, monitors, etc, (and so are unlike
> "Optional<int>" if such a thing existed) and you can readily tell if
> you have an Integer vs an int. On the other hand, you can still use
> ints as (autoboxed) objects inside collections etc without needing to
> have a special implementation just for ints (at the price of
> now-famous space bloats).
> 
> Any language/library that embraces both of these notions together has
> to do something that is not identical to either pure FP or OO
> approaches. Some languages get a foothold by distinguishing object
> types from value types. Thus, nullness applies to objects,
> optionalness applies to values.  So, Scala, Lime, etc have variants
> of:
> 
> 1. value types: int, double etc
> 2. Optional<V>: the result of partial functions on value types
> 3. object types (Object and subclasses)
> 4. refs: possibly null references to objects
> 
> 
> We don't have this foothold.  Arguably, because of this, we should not
> be creating such frameworks. Be we are.
> 
> So the choices are:
> 
> A. Pretend we have value types. Introduce Optional for use with any
> value-like things, along with some set of conventions about how they
> interact with objects and possibly null refs.
> 
> B. Don't pretend we have value types unless/until we have them.  Use
> the standard OO conventions, in which boxing classes like Integers are
> used when you need to elevate a value to objecthood. And when you have
> one, you have a full-fledged object, not just an invisible pointer.
> And when you don't have one, you just have null.
> 
> Choice A is tempting because of its familiarity by programmers with
> FP background. But doing so forces a never-ending set of bandaids
> (as we've seen lately) because none of the rules for interoperating
> with Object conventions make much sense.
> 
> Sticking with (B) is less tempting to some people not only because
> they like to think of some of their classes in value-like ways, but
> also because streams (like java.util.concurrent) would need to
> relentlessly maintain the "null means nothing there" policy. So,
> emptyStream.reduce(f) must return null, null elements appearing in
> streams must be skipped, etc. But not only is this the most defensible
> policy to use in the absence of true value types, it is best suited to
> kludgelessly evolve to embrace value types if they are ever supported.
> 
> There is also a choice C: always throw exceptions for partial
> functions / nothing-there cases. The logic of this is fine, and
> completely reasonable is when nothing-there-ness is accidental or
> exceptional. But the world voted against the painfulness and
> inefficiency of everyday programming under this encoding of
> nothing-there decades ago.
> 
> Summary: get rid of Optional. Use null consistently to mean nothing
> there (plus exceptions in exceptional cases). Use the standard boxed
> types for numerics. Until/unless there is are value types, create
> intStream etc as a separate set of classes with merely analogous APIs.
> (And while we are at it, add LongKeyHashMap and a few others!)
> Don't worry about people who used null as "meaningful" elements, map
> keys or map values.  No one is forcing them to use streams.
> 
> -Doug
> 
>