Do Set implementations waste memory?
Osvaldo Doederlein
opinali at gmail.com
Thu Mar 18 13:22:47 UTC 2010
Hi,
I've tread the google-groups thread, it seems you didn't test on a 64-bit
VM. Could you do that test, with and without CompressedOops, and using
latest HotSpot (7b85 or 6u20ea)? I guess we should see advantages in both
memory savings and speed, at least with CompressedOops.
It is too easy to dismiss an optimization on the basis of "doesn't deliver
benefit on a particular VM". It may be good on a different implementation,
or a different architecture like 32 vs. 64 bits. Object headers, field
layouts, alignments etc., are not portable, and the best rule of thumb is
that any removed field _will_ reduce memory usage at least in some
implementation.
The oldest collection classes were designed for the needs of J2SE 1.2, a
full decade ago. This was discussed before, IIRC there was some reply from
Josh agreeing that some speeed-vs-size tradeoffs made last decade should be
revisited today. The extra runtime size/bloat that a specialized HashSet
implementation would cost, was reasonably significant in 1999 but completely
irrelevant in 2010. I mean, HashSet is a HUGELY important collection, and
the benefit of any optimization of its implementation would spread to many
APIs and applications.
And the problem is not only the extra value field, there is also overhead
from the extra indirection (plus extra polymorphic call) from the HashSet
object to the internal HashMap object. This overhead may sometimes be
sufficient to block inlining and devirtualization, so it's a potentially
bigger cost than just a single extra memory load (which is easily hoisted
out of loops etc.). Look at this code inside HashSet for a much worse cost:
public Iterator<E> iterator() {
return map.keySet().iterator();
}
Yeah we pay the cost of building the internal HashMap's key-set (which is
lazily-built), just to iterate the freaking HashSet. (Notice that
differently from HashMap, a Set is a true Collection that we can iterate
directly without any view-collection of keys/values/entries.)
IMHO all this adds evidence that the current HashSet implementation is a
significant performance bug. We need a brand-new impl that does the hashing
internally, without relying on HashMap, without any unused fields, extra
indirections, or surprising costs like that for iterator(). I guess it would
be relatively simple to copy-paste HashMap's code, cut stuff until just a
Set of keys is left, and merge in the most specific pieces of HashSet
(basically just readObject()/writeObject()).
A+
Osvaldo
2010/3/17 Dimitris Andreou <jim.andreou at gmail.com>
>
> 2010/3/18 Rémi Forax <forax at univ-mlv.fr>
>
> Le 18/03/2010 00:59, Paulo Levi a écrit :
>>
>> My understanding is that set implementations are implemented by using
>>> Maps internally + a marker object, and that since Maps are implemented using
>>> arrays of entries this is at least n*3 references more that what is needed,
>>> since there are never multiple values.
>>>
>>> Any plans to change this? I suspect it would be a boon for programs that
>>> use the correct data structure.
>>>
>>
>> You have to test it. My guess is that there will be no difference.
>> As far as I remember, an object needs to be aligned on a valid 64bits
>> address even in 32bits mode,
>> Hotspot uses a 64bits header and the internal hash map entry contains 4
>> ints,
>> if you remove the reference corresponding to the value, the empty place
>> will be
>> considered as garbage and not used.
>>
>
> Else, you can try to remove the internal entry object but in that case
>> the hashcode of the element will be not stored anymore and you will
>> have a slowdown for all objects that doesn't cache their hashcode by
>> itself.
>>
>>
> See my second-to-last post in this thread:
>
> http://groups.google.com/group/guava-discuss/browse_thread/thread/23bc8fa5ae479698
>
> In short, I tested removing the "value" field of a HashMap's entry object,
> and indeed (through Instrumentation#getObjectSize) I observed no reduction
> in memory. I had to remove one further field (e.g. "hash") to make a
> reduction (of 8 bytes per entry).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100318/1b035ec9/attachment.html>
More information about the core-libs-dev
mailing list