RFR(m): 8139233 add initial compact immutable collection implementations
Peter Levart
peter.levart at gmail.com
Thu May 5 16:27:09 UTC 2016
Hi,
On 05/05/2016 02:54 PM, Stephen Colebourne wrote:
> On 5 May 2016 at 02:30, Stuart Marks <stuart.marks at oracle.com> wrote:
>>> I disagree with altering the iteration order. Guava's ImmutableSet and
>>> ImmutableMap have reliable iteration specified to match creation
>>> order. This aspect of the design is very useful.
>>
>> I think this is a reasonable thing to want, but it would be a different API.
>> The current Set.of() and Map.of() APIs have unspecified iteration order, and
>> I want to "enforce" that via randomizing the iteration order. Having
>> unspecified iteration order, but with the implementation giving a stable
>> order, is the worst of both worlds. It lets clients bake in inadvertent
>> order dependencies, and then when we do change it, it breaks them. (This
>> seems to happen every couple years with HashMap.) Randomizing iteration
>> order helps preserve implementation freedom.
>>
>> That said, it's a fine thing to have a different API that gives a Set or Map
>> with a stable iteration order. Several people have asked for this. I've
>> filed JDK-8156070 to track this.
> To be clear, I believe that the spec of Set.of() and Map.of() should
> require iteration order matching insertion order (because the order is
> very clearly defined in this case and anything else will be
> surprising).
>
> Stephen
So what if the API was specified to have iteration order matching
construction order. Are you afraid that would impact implementations so
that they would be less compact or less performant? At least for small
number of elements, there is a possible implementation that keeps
construction order and is even more compact and still has O(1) lookup
performance. For example an implementation for up to 255 elements:
public class Set0xFF<E> extends AbstractSet<E> implements Serializable {
/**
* A "salt" value used for randomizing iteration order. This is
initialized once
* and stays constant for the lifetime of the JVM. It need not be
truly random, but
* it needs to vary sufficiently from one run to the next so that
iteration order
* will vary between JVM runs.
*/
static final int SALT;
static {
SALT = new Random().nextInt();
}
/**
* The reciprocal of load factor. Given a number of elements
* to store, multiply by this factor to get the table size.
*/
static final double EXPAND_FACTOR = 2.0;
final E[] elements;
final byte[] index;
@SafeVarargs
@SuppressWarnings("unchecked")
Set0xFF(E... input) {
if (input.length > 255) {
throw new IllegalArgumentException("To many elements for
this implementation");
}
elements = input.clone();
index = new byte[(int) Math.ceil(EXPAND_FACTOR * elements.length)];
for (int i = 0; i < elements.length; i++) {
E e = Objects.requireNonNull(elements[i]);
int idx = probe(e);
if (idx >= 0) {
throw new IllegalArgumentException("duplicate element:
" + e);
} else {
index[-(idx + 1)] = (byte) (i + 1);
}
}
}
@Override
public int size() {
return elements.length;
}
@Override
@SuppressWarnings("unchecked")
public boolean contains(Object o) {
Objects.requireNonNull(o);
return probe((E) o) >= 0;
}
@Override
public Iterator<E> iterator() {
return new Iterator<E>() {
int idx = 0;
@Override
public boolean hasNext() {
return idx < elements.length;
}
@Override
public E next() {
int i = idx;
if (i >= elements.length) {
throw new NoSuchElementException();
}
idx = i + 1;
return elements[i];
}
};
}
// returns index into index array at which element index + 1 is
present; or if absent,
// (-i - 1) where i is location where element index should be inserted
private int probe(E pe) {
int idx = Math.floorMod(pe.hashCode() ^ SALT, index.length);
while (true) {
int i = (int) index[idx] & 0xFF;
if (i == 0) {
return -idx - 1;
} else if (elements[i - 1].equals(pe)) {
return idx;
} else if (++idx == index.length) {
idx = 0;
}
}
}
}
Your SetN occupies (without considering the elements themselves) the
following number of bytes:
8 * size() (on 32 bit/compressed OOPS) or 16 * size() (on 64bit
OOPS) (+ 2 object headers and alignments)
Above Set0xFF occupies:
6 * size() (on 32 bit/compressed OOPS) or 10 * size() (on 64bit
OOPS) (+ 3 object headers and alignments)
An analogue Set0xFFFF for up to 65535 elements would occupy:
8 * size() (on 32 bit/compressed OOPS) or 12 * size() (on 64bit
OOPS) (+ 3 object headers and alignments)
An analogue Set0xFFFFFFFF for up to 2^31 - 1 elements that still keeps
iteration order would occupy:
12 * size() (on 32 bit/compressed OOPS) or 16 * size() (on 64bit
OOPS) (+ 3 object headers and alignments)
So only on 32 bit/compressed OOPS for # of elements in range 2^16 ...
2^31 - 1 there is 1.5 x footprint increase.
Worth considering?
Regards, Peter
More information about the core-libs-dev
mailing list