Sentinels in collections, was RE: Primitive Queue<any T> considerations

John Rose john.r.rose at oracle.com
Thu Nov 19 04:43:45 UTC 2015


On Nov 18, 2015, at 5:15 PM, Rezaei, Mohammad A. <Mohammad.Rezaei at gs.com> wrote:
> 
> From a Valhalla perspective, the fact that one cannot manufacture a sentinel besides T.default is a real problem for collections that need more than one sentinel. The current types (primitives, references) don’t have this limitation.

The int type has such a limitation, in that somebody has to choose (and manufacture) an int value the doesn't look like a valid payload.

> (This is also not a problem in languages like C where I can cast a struct to char* and fill it with arbitrary bit patterns).

OK, I'll bite.  If this is important enough, we can generalize optional<T> to a new wrapper type that allocates more than one extra bit pattern.  Suppose there were a value type sentinel<T,X>, where X is a type unique to the sentinel usage, backed by T plus "epsilon" bits.  The "epsilon" is enough to distinguish the X state from any similar state (or the optional.isPresent state) in the underlying type T.  I'm not saying this is easy or natural to express in the JVM, but it logically expresses the requirement.

class MyQueue<T> {
  private sentinel<optional<T>,MyQueue>[] buf;
  …
  private atSentinel() { return buf[current].isSentinel(X.class); }
  public optional<T> getOptional() { return buf[current].getNonSentinel(X.class); }
  public T get() { return getOptional().get(); }
}

The naive implementation is that, like optional, each layer of "sentinel-ness" adds an extra boolean of envelope around the underlying value.  But, given some ad hoc JVM optimizations, some day, up to 255 sentinel and optional layers could share a tag byte expressing the various possible levels of "not present", and/or use the T payload itself to express many (2**32) layers of "not present" with an extra bit of tag.  Thus, "epsilon" can be 8 or even 1.

In the tag-byte case (eps = 8), this gives you up to 255 sentinel values per value type.  Now, if you add another trick to the JVM, it could notice if T has slack bits in it (uncompressed pointers have 3-4 slack bits, while doubles have over 50 slack bits) and place the "not present" codes into the T bits, with no additional tags, not even one bit.  In that case, "epsilon" = 0 and everybody is happy, since they have re-invented "null", but everybody gets their own "null" (the X parameter).

This stuff *can* be done, and maybe in the future we will decide to do it.  (Optimization budget, right Vitaly?)  By making the JVM master of the value type layouts, we can reserve the option to make such optimizations.

— John


More information about the valhalla-dev mailing list