RFR(m): 8139233 add initial compact immutable collection implementations

Thu May 5 17:04:00 UTC 2016

Where is the current commitment to unspecified iteration order?  Is that in
Java 9?

Generally speaking, I see no problem with going from unspecified to
specified iteration order; if code had to be able to deal with *any* order
previously they can certainly deal with an order that happens to be the
obvious sensible order.

Furthermore, the code that is easiest to use should be the simplest, most
desirable use case, and I'd definitely say that means using the predictable
order the elements were provided in.  Providing that as another, more
awkwardly named method seems deeply undesirable: Set.of and Map.of are the
short, easy to use method names, and they should have the simplest,
easiest-to-understand behavior: insertion order iteration.

On Thu, May 5, 2016 at 9:27 AM Peter Levart <peter.levart at gmail.com> wrote:

> Hi,
>
>
> On 05/05/2016 02:54 PM, Stephen Colebourne wrote:
> > On 5 May 2016 at 02:30, Stuart Marks <stuart.marks at oracle.com> wrote:
> >>> I disagree with altering the iteration order. Guava's ImmutableSet and
> >>> ImmutableMap have reliable iteration specified to match creation
> >>> order. This aspect of the design is very useful.
> >>
> >> I think this is a reasonable thing to want, but it would be a different
> API.
> >> The current Set.of() and Map.of() APIs have unspecified iteration
> order, and
> >> I want to "enforce" that via randomizing the iteration order. Having
> >> unspecified iteration order, but with the implementation giving a stable
> >> order, is the worst of both worlds. It lets clients bake in inadvertent
> >> order dependencies, and then when we do change it, it breaks them. (This
> >> seems to happen every couple years with HashMap.) Randomizing iteration
> >> order helps preserve implementation freedom.
> >>
> >> That said, it's a fine thing to have a different API that gives a Set
> or Map
> >> with a stable iteration order. Several people have asked for this. I've
> >> filed JDK-8156070 to track this.
> > To be clear, I believe that the spec of Set.of() and Map.of() should
> > require iteration order matching insertion order (because the order is
> > very clearly defined in this case and anything else will be
> > surprising).
> >
> > Stephen
>
> So what if the API was specified to have iteration order matching
> construction order. Are you afraid that would impact implementations so
> that they would be less compact or less performant? At least for small
> number of elements, there is a possible implementation that keeps
> construction order and is even more compact and still has O(1) lookup
> performance. For example an implementation for up to 255 elements:
>
> public class Set0xFF<E> extends AbstractSet<E> implements Serializable {
>
>      /**
>       * A "salt" value used for randomizing iteration order. This is
> initialized once
>       * and stays constant for the lifetime of the JVM. It need not be
> truly random, but
>       * it needs to vary sufficiently from one run to the next so that
> iteration order
>       * will vary between JVM runs.
>       */
>      static final int SALT;
>
>      static {
>          SALT = new Random().nextInt();
>      }
>
>      /**
>       * The reciprocal of load factor. Given a number of elements
>       * to store, multiply by this factor to get the table size.
>       */
>      static final double EXPAND_FACTOR = 2.0;
>
>      final E[] elements;
>      final byte[] index;
>
>      @SafeVarargs
>      @SuppressWarnings("unchecked")
>      Set0xFF(E... input) {
>          if (input.length > 255) {
>              throw new IllegalArgumentException("To many elements for
> this implementation");
>          }
>          elements = input.clone();
>          index = new byte[(int) Math.ceil(EXPAND_FACTOR *
> elements.length)];
>          for (int i = 0; i < elements.length; i++) {
>              E e = Objects.requireNonNull(elements[i]);
>              int idx = probe(e);
>              if (idx >= 0) {
>                  throw new IllegalArgumentException("duplicate element:
> " + e);
>              } else {
>                  index[-(idx + 1)] = (byte) (i + 1);
>              }
>          }
>      }
>
>      @Override
>      public int size() {
>          return elements.length;
>      }
>
>      @Override
>      @SuppressWarnings("unchecked")
>      public boolean contains(Object o) {
>          Objects.requireNonNull(o);
>          return probe((E) o) >= 0;
>      }
>
>      @Override
>      public Iterator<E> iterator() {
>          return new Iterator<E>() {
>              int idx = 0;
>
>              @Override
>              public boolean hasNext() {
>                  return idx < elements.length;
>              }
>
>              @Override
>              public E next() {
>                  int i = idx;
>                  if (i >= elements.length) {
>                      throw new NoSuchElementException();
>                  }
>                  idx = i + 1;
>                  return elements[i];
>              }
>          };
>      }
>
>      // returns index into index array at which element index + 1 is
> present; or if absent,
>      // (-i - 1) where i is location where element index should be inserted
>      private int probe(E pe) {
>          int idx = Math.floorMod(pe.hashCode() ^ SALT, index.length);
>          while (true) {
>              int i = (int) index[idx] & 0xFF;
>              if (i == 0) {
>                  return -idx - 1;
>              } else if (elements[i - 1].equals(pe)) {
>                  return idx;
>              } else if (++idx == index.length) {
>                  idx = 0;
>              }
>          }
>      }
> }
>
>
> Your SetN occupies (without considering the elements themselves) the
> following number of bytes:
>
>      8 * size() (on 32 bit/compressed OOPS) or 16 * size() (on 64bit
> OOPS) (+ 2 object headers and alignments)
>
> Above Set0xFF occupies:
>
>      6 * size() (on 32 bit/compressed OOPS) or 10 * size() (on 64bit
> OOPS) (+ 3 object headers and alignments)
>
> An analogue Set0xFFFF for up to 65535 elements would occupy:
>
>      8 * size() (on 32 bit/compressed OOPS) or 12 * size() (on 64bit
> OOPS) (+ 3 object headers and alignments)
>
> An analogue Set0xFFFFFFFF for up to 2^31 - 1 elements that still keeps
> iteration order would occupy:
>
>      12 * size() (on 32 bit/compressed OOPS) or 16 * size() (on 64bit
> OOPS) (+ 3 object headers and alignments)
>
>
> So only on 32 bit/compressed OOPS for # of elements in range 2^16 ...
> 2^31 - 1 there is 1.5 x footprint increase.
>
>
> Worth considering?
>
> Regards, Peter
>
>