RFR(m): 8139233 add initial compact immutable collection implementations

Peter Levart peter.levart at gmail.com
Thu May 5 16:27:09 UTC 2016


Hi,


On 05/05/2016 02:54 PM, Stephen Colebourne wrote:
> On 5 May 2016 at 02:30, Stuart Marks <stuart.marks at oracle.com> wrote:
>>> I disagree with altering the iteration order. Guava's ImmutableSet and
>>> ImmutableMap have reliable iteration specified to match creation
>>> order. This aspect of the design is very useful.
>>
>> I think this is a reasonable thing to want, but it would be a different API.
>> The current Set.of() and Map.of() APIs have unspecified iteration order, and
>> I want to "enforce" that via randomizing the iteration order. Having
>> unspecified iteration order, but with the implementation giving a stable
>> order, is the worst of both worlds. It lets clients bake in inadvertent
>> order dependencies, and then when we do change it, it breaks them. (This
>> seems to happen every couple years with HashMap.) Randomizing iteration
>> order helps preserve implementation freedom.
>>
>> That said, it's a fine thing to have a different API that gives a Set or Map
>> with a stable iteration order. Several people have asked for this. I've
>> filed JDK-8156070 to track this.
> To be clear, I believe that the spec of Set.of() and Map.of() should
> require iteration order matching insertion order (because the order is
> very clearly defined in this case and anything else will be
> surprising).
>
> Stephen

So what if the API was specified to have iteration order matching 
construction order. Are you afraid that would impact implementations so 
that they would be less compact or less performant? At least for small 
number of elements, there is a possible implementation that keeps 
construction order and is even more compact and still has O(1) lookup 
performance. For example an implementation for up to 255 elements:

public class Set0xFF<E> extends AbstractSet<E> implements Serializable {

     /**
      * A "salt" value used for randomizing iteration order. This is 
initialized once
      * and stays constant for the lifetime of the JVM. It need not be 
truly random, but
      * it needs to vary sufficiently from one run to the next so that 
iteration order
      * will vary between JVM runs.
      */
     static final int SALT;

     static {
         SALT = new Random().nextInt();
     }

     /**
      * The reciprocal of load factor. Given a number of elements
      * to store, multiply by this factor to get the table size.
      */
     static final double EXPAND_FACTOR = 2.0;

     final E[] elements;
     final byte[] index;

     @SafeVarargs
     @SuppressWarnings("unchecked")
     Set0xFF(E... input) {
         if (input.length > 255) {
             throw new IllegalArgumentException("To many elements for 
this implementation");
         }
         elements = input.clone();
         index = new byte[(int) Math.ceil(EXPAND_FACTOR * elements.length)];
         for (int i = 0; i < elements.length; i++) {
             E e = Objects.requireNonNull(elements[i]);
             int idx = probe(e);
             if (idx >= 0) {
                 throw new IllegalArgumentException("duplicate element: 
" + e);
             } else {
                 index[-(idx + 1)] = (byte) (i + 1);
             }
         }
     }

     @Override
     public int size() {
         return elements.length;
     }

     @Override
     @SuppressWarnings("unchecked")
     public boolean contains(Object o) {
         Objects.requireNonNull(o);
         return probe((E) o) >= 0;
     }

     @Override
     public Iterator<E> iterator() {
         return new Iterator<E>() {
             int idx = 0;

             @Override
             public boolean hasNext() {
                 return idx < elements.length;
             }

             @Override
             public E next() {
                 int i = idx;
                 if (i >= elements.length) {
                     throw new NoSuchElementException();
                 }
                 idx = i + 1;
                 return elements[i];
             }
         };
     }

     // returns index into index array at which element index + 1 is 
present; or if absent,
     // (-i - 1) where i is location where element index should be inserted
     private int probe(E pe) {
         int idx = Math.floorMod(pe.hashCode() ^ SALT, index.length);
         while (true) {
             int i = (int) index[idx] & 0xFF;
             if (i == 0) {
                 return -idx - 1;
             } else if (elements[i - 1].equals(pe)) {
                 return idx;
             } else if (++idx == index.length) {
                 idx = 0;
             }
         }
     }
}


Your SetN occupies (without considering the elements themselves) the 
following number of bytes:

     8 * size() (on 32 bit/compressed OOPS) or 16 * size() (on 64bit 
OOPS) (+ 2 object headers and alignments)

Above Set0xFF occupies:

     6 * size() (on 32 bit/compressed OOPS) or 10 * size() (on 64bit 
OOPS) (+ 3 object headers and alignments)

An analogue Set0xFFFF for up to 65535 elements would occupy:

     8 * size() (on 32 bit/compressed OOPS) or 12 * size() (on 64bit 
OOPS) (+ 3 object headers and alignments)

An analogue Set0xFFFFFFFF for up to 2^31 - 1 elements that still keeps 
iteration order would occupy:

     12 * size() (on 32 bit/compressed OOPS) or 16 * size() (on 64bit 
OOPS) (+ 3 object headers and alignments)


So only on 32 bit/compressed OOPS for # of elements in range 2^16 ... 
2^31 - 1 there is 1.5 x footprint increase.


Worth considering?

Regards, Peter




More information about the core-libs-dev mailing list