Do Set implementations waste memory?

Thu Mar 18 14:10:01 UTC 2010

2010/3/18 Osvaldo Doederlein <opinali at gmail.com>

> Hi,
>
> I've tread the google-groups thread, it seems you didn't test on a 64-bit
> VM. Could you do that test, with and without CompressedOops, and using
> latest HotSpot (7b85 or 6u20ea)? I guess we should see advantages in both
> memory savings and speed, at least with CompressedOops.
>
> It is too easy to dismiss an optimization on the basis of "doesn't deliver
> benefit on a particular VM". It may be good on a different implementation,
> or a different architecture like 32 vs. 64 bits. Object headers, field
> layouts, alignments etc., are not portable, and the best rule of thumb is
> that any removed field _will_ reduce memory usage at least in some
> implementation.
>
> The oldest collection classes were designed for the needs of J2SE 1.2, a
> full decade ago. This was discussed before, IIRC there was some reply from
> Josh agreeing that some speeed-vs-size tradeoffs made last decade should be
> revisited today. The extra runtime size/bloat that a specialized HashSet
> implementation would cost, was reasonably significant in 1999 but completely
> irrelevant in 2010. I mean, HashSet is a HUGELY important collection, and
> the benefit of any optimization of its implementation would spread to many
> APIs and applications.
>
> And the problem is not only the extra value field, there is also overhead
> from the extra indirection (plus extra polymorphic call) from the HashSet
> object to the internal HashMap object. This overhead may sometimes be
> sufficient to block inlining and devirtualization, so it's a potentially
> bigger cost than just a single extra memory load (which is easily hoisted
> out of loops etc.). Look at this code inside HashSet for a much worse cost:
>
>     public Iterator<E> iterator() {
>         return map.keySet().iterator();
>     }
>
> Yeah we pay the cost of building the internal HashMap's key-set (which is
> lazily-built), just to iterate the freaking HashSet. (Notice that
> differently from HashMap, a Set is a true Collection that we can iterate
> directly without any view-collection of keys/values/entries.)
>
> IMHO all this adds evidence that the current HashSet implementation is a
> significant performance bug. We need a brand-new impl that does the hashing
> internally, without relying on HashMap, without any unused fields, extra
> indirections, or surprising costs like that for iterator(). I guess it would
> be relatively simple to copy-paste HashMap's code, cut stuff until just a
> Set of keys is left, and merge in the most specific pieces of HashSet
> (basically just readObject()/writeObject()).
>

Hi,

Sorry, I was disappointed by the result and sent the code to /dev/null, so
can't readily test that, but yes, it is a relatively simple exercise. In my
opinion, if someone is going to undertake the task of creating a new
HashSet, he'd better start from a white page, not going the
"HashMap-->snip-->snip-->HashSet" path. Even if for some platforms there
would be some gains through this path, not reducing memory footprint on a
large number of 32-bit platforms would be quite a pity. (About runtime
performance, given the amount of "magic" in the JVM, I dare to say even
less). I find what Martin suggests on that thread (his second-to-last post)
a quite promising alternative (open addressing plus two parallel arrays, for
keys and for hashes).

Just my 2c.

I would love to know in more detail Doug's opinion on the matter.

Dimitris

>
> A+
> Osvaldo
>
> 2010/3/17 Dimitris Andreou <jim.andreou at gmail.com>
>
>>
>> 2010/3/18 Rémi Forax <forax at univ-mlv.fr>
>>
>> Le 18/03/2010 00:59, Paulo Levi a écrit :
>>>
>>>  My understanding is that set implementations are implemented by using
>>>> Maps internally + a marker object, and that since Maps are implemented using
>>>> arrays of entries this is at least n*3 references more that what is needed,
>>>> since there are never multiple values.
>>>>
>>>> Any plans to change this? I suspect it would be a boon for programs that
>>>> use the correct data structure.
>>>>
>>>
>>> You have to test it. My guess is that there will be no difference.
>>> As far as I remember, an object needs to be aligned on a valid 64bits
>>> address even in 32bits mode,
>>> Hotspot uses a 64bits header and the internal hash map entry contains 4
>>> ints,
>>> if you remove the reference corresponding to the value, the empty place
>>> will be
>>> considered as garbage and not used.
>>>
>>
>> Else, you can try to remove the internal entry object but in that case
>>> the hashcode of the element will be not stored anymore and you will
>>> have a slowdown for all objects that doesn't cache their hashcode by
>>> itself.
>>>
>>>
>> See my second-to-last post in this thread:
>>
>> http://groups.google.com/group/guava-discuss/browse_thread/thread/23bc8fa5ae479698
>>
>> In short, I tested removing the "value" field of a HashMap's entry object,
>> and indeed (through Instrumentation#getObjectSize) I observed no reduction
>> in memory. I had to remove one further field (e.g. "hash") to make a
>> reduction (of 8 bytes per entry).
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100318/64ec4a0e/attachment.html>