Sentinels in collections, was RE: Primitive Queue<any T> considerations

Rezaei, Mohammad A. Mohammad.Rezaei at gs.com
Thu Nov 19 15:19:10 UTC 2015


On Nov 18, 2015 11:43 PM, "John Rose" <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

                The int type has such a limitation, in that somebody has to choose (and manufacture) an int value the doesn't look like a valid payload.

That’s true if we’re talking about the Valhalla reified type int. For the plain old primitive int, that’s not the case. For contrast with code a bit further down, I’ll write it out:

public class IntHashSet {  // all possible values can be used by the user.
                private static final int EMPTY = 0; // does not leak outside
                private static final int REMOVED = 1; // does not leak outside
                private int[] entries; // contains EMPTY, REMOVED or user provided value
                private byte oneAndZero; // stores the set’s containment of 0 and 1

                public void add(int v) {
                                if (v == EMPTY) oneAnZero |= 1;
                                else if (v == REMOVED) oneAndZero |= 2;
                                else addToEntries(v);
                }

private void addToEntries(int v) {
                int index = indexOf(v);
                if (entries[index] == EMTPY || entries[index] == REMOVED || entries[index] == v) entries[index] = v;
                else findAnotherSlot(v);
};
}

Similar treatment works for map<int, V>.

If we’re willing to pay for a tag, and occasionally get lucky and pay nothing, all sorts of possibilities open up. From a complexity/optimization budget perspective, I’d rather see that spent on a more general purpose concept: union types. The above code would look like this with union types (excuse the syntax):

public class AnyfiedSet<any V> {
                private static __value EMPTY {} // member-less value type
                private static __value REMOVED {} // member-less value type
                private static __unionType STORAGE_TYPE { EMPTY, RESERVED, V}

                private STORAGE_TYPE[] entries;

                public void add(V v)
                {
                                // Vitaly, with tagged structures, you wouldn’t examine v at this point, but we’ll need instanceof or a way to examine the tag
                                int index = indexOf(v);
// == becomes instanceof:
                                if (entries[index] instanceof EMPTY || entries[index] instanceof REMOVED || entries[index] instanceof V && ((V)entries[index])== v)
entries[index] = v;
                                else findAnotherSlot(v);
                }
}

Sadly, no amount of optimization will make this as good as the first implementation for the existing crop of integer-like primitives. While primitive collections libraries will benefit from any-fied interfaces, for the highly optimized space they compete in, there will be a per-type implementation even with Valhalla.

I’m generally a fan of letting the VM pick the layout, so long as I can query it (reflection, unsafe, etc). That does open up a lot of optimization as you mention below. In particular, your sentinel proposal can be subsumed into yet another useful construct: enum value type. If I can defined:

                __enum_value Sentinel {EMPTY, REMOVED, UNSENTINEL}

Then all I need is a Pair<Sentinel, V> and the VM can decide to fit all the bits in the storage of V, if V has 2 extra bits to spare.

Thanks
Moh

From: John Rose [mailto:john.r.rose at oracle.com]
Sent: Wednesday, November 18, 2015 11:44 PM
To: Rezaei, Mohammad A. [Tech]
Cc: Vitaly Davidovich; Richard Warburton; valhalla-dev at openjdk.java.net
Subject: Re: Sentinels in collections, was RE: Primitive Queue<any T> considerations

On Nov 18, 2015, at 5:15 PM, Rezaei, Mohammad A. <Mohammad.Rezaei at gs.com<mailto:Mohammad.Rezaei at gs.com>> wrote:

From a Valhalla perspective, the fact that one cannot manufacture a sentinel besides T.default is a real problem for collections that need more than one sentinel. The current types (primitives, references) don’t have this limitation.



(This is also not a problem in languages like C where I can cast a struct to char* and fill it with arbitrary bit patterns).

OK, I'll bite.  If this is important enough, we can generalize optional<T> to a new wrapper type that allocates more than one extra bit pattern.  Suppose there were a value type sentinel<T,X>, where X is a type unique to the sentinel usage, backed by T plus "epsilon" bits.  The "epsilon" is enough to distinguish the X state from any similar state (or the optional.isPresent state) in the underlying type T.  I'm not saying this is easy or natural to express in the JVM, but it logically expresses the requirement.

[1]class MyQueue<T> {
  private sentinel<optional<T>,MyQueue>[] buf;
  …
  private atSentinel() { return buf[current].isSentinel(X.class); }
  public optional<T> getOptional() { return buf[current].getNonSentinel(X.class); }
  public T get() { return getOptional().get(); }
}

The naive implementation is that, like optional, each layer of "sentinel-ness" adds an extra boolean of envelope around the underlying value.  But, given some ad hoc JVM optimizations, some day, up to 255 sentinel and optional layers could share a tag byte expressing the various possible levels of "not present", and/or use the T payload itself to express many (2**32) layers of "not present" with an extra bit of tag.  Thus, "epsilon" can be 8 or even 1.

In the tag-byte case (eps = 8), this gives you up to 255 sentinel values per value type.  Now, if you add another trick to the JVM, it could notice if T has slack bits in it (uncompressed pointers have 3-4 slack bits, while doubles have over 50 slack bits) and place the "not present" codes into the T bits, with no additional tags, not even one bit.  In that case, "epsilon" = 0 and everybody is happy, since they have re-invented "null", but everybody gets their own "null" (the X parameter).

This stuff *can* be done, and maybe in the future we will decide to do it.  (Optimization budget, right Vitaly?)  By making the JVM master of the value type layouts, we can reserve the option to make such optimizations.

— John


More information about the valhalla-dev mailing list