valhalla-dev Digest, Vol 6, Issue 3

Peter Levart peter.levart at gmail.com
Fri Dec 26 10:45:58 UTC 2014


Hi Thomas,

My thoughts on sentinels, etc...

On 12/26/2014 02:28 AM, Thomas W wrote:
> There is much code already -- Map.get() -- returning a sentinel for "not
> found". My proposal is to expressly recognize that; and ensure that better
> values than 0 are used.
>
> My initial proposal was to use -1;  not perfect I agree, but the bulk of C
> library APIs has been happy to do without this number for a very long time
> :)  However, I'm happy to revise it to Integer.MIN_VALUE. 'double' could use
> either MIN_VALUE or NaN. 'boolean' will have false, and that's the best we
> can do.
>
> Since efficiency is the main reason to specialize, we shouldn't require
> Map.get()'s single-word return to be "blown out" to two; nor should
> wrapping in Optional be required.

If value types are done right, then "wrapping" a value type inside 
another value type is not actually wrapping (like we are used to call it 
with reference types), but embedding. The return of Optional<int> or any 
Optional<ValueType> does not represent an implementation overhead of 
another indirection like with reference types.  If all goes well, then I 
think the plan is for "blowing out" a single word int return to double 
word Optional<int> means the additional word will be assigned to another 
CPU register in machine code - hardly something we can measure.

> Unless we're going to drop Map.get() from the API, which we shouldn't, a
> sentinel is proveably the only solution.
>
> As with Objects, anybody who actually needs to distinguish
> Integer.MIN_VALUE from "not present" in Map<?,int> can use the same
> approaches as present:
> 1)  check containsKey() first, then call get()

Not good for ConcurrentMap(s).

> 2)  use getOrDefault()

Unless all the values are needed.

> 3)  "mask" the sentinel with another value.
>
> Usage of the sentinel (for single-return APIs, or -- less recommended --
> for internal storage) will be up to the collection author.

I think that the desire to use a sentinel for internal storage in 
value-type based data structures such as collections is greater than 
just for returns from get() or such which are not problematic at all if 
Optional<ValueType> is used instead. Take for example a HashMap<any K, 
any V>. What would be the most efficient representation of specialized 
HashMap<int, int> for example? Perhaps a linear-scan hash table 
implemented as Map.Entry<any K, any V>[] array. So what is the most 
efficient representation of specialized Map.Entry<int, int> ? Here, 
using a sentinel value for "not-present" key can save 1/3 or even 1/2 
(depends on alignment constraints on various architectures) of memory 
used for such a structure. With larger value types as keys and/or 
values, additional flag in Map.Entry implementation does not matter much 
any more. So we might need two implementations - with and without 
sentinel keys.

> Given the efficiency of a single machine-word, rather than two, and the
> absence of multiple-returns or out-parameters from the Java language, this
> solution is sane, efficient & suitable for performant use.

But value types as return types "are" in effect multiple-returns, 
packaged in a disguise of a class-like thing.


Regards, Peter




More information about the valhalla-dev mailing list