Do Set implementations waste memory?
Ulf Zibis
Ulf.Zibis at gmx.de
Thu Mar 18 13:54:17 UTC 2010
+1
-Ulf
Am 18.03.2010 14:22, schrieb Osvaldo Doederlein:
>
> The oldest collection classes were designed for the needs of J2SE 1.2,
> a full decade ago. This was discussed before, IIRC there was some
> reply from Josh agreeing that some speeed-vs-size tradeoffs made last
> decade should be revisited today. The extra runtime size/bloat that a
> specialized HashSet implementation would cost, was reasonably
> significant in 1999 but completely irrelevant in 2010. I mean, HashSet
> is a HUGELY important collection, and the benefit of any optimization
> of its implementation would spread to many APIs and applications.
>
> And the problem is not only the extra value field, there is also
> overhead from the extra indirection (plus extra polymorphic call) from
> the HashSet object to the internal HashMap object. This overhead may
> sometimes be sufficient to block inlining and devirtualization, so
> it's a potentially bigger cost than just a single extra memory load
> (which is easily hoisted out of loops etc.). Look at this code inside
> HashSet for a much worse cost:
>
> public Iterator<E> iterator() {
> return map.keySet().iterator();
> }
>
> Yeah we pay the cost of building the internal HashMap's key-set (which
> is lazily-built), just to iterate the freaking HashSet. (Notice that
> differently from HashMap, a Set is a true Collection that we can
> iterate directly without any view-collection of keys/values/entries.)
>
> IMHO all this adds evidence that the current HashSet implementation is
> a significant performance bug. We need a brand-new impl that does the
> hashing internally, without relying on HashMap, without any unused
> fields, extra indirections, or surprising costs like that for
> iterator(). I guess it would be relatively simple to copy-paste
> HashMap's code, cut stuff until just a Set of keys is left, and merge
> in the most specific pieces of HashSet (basically just
> readObject()/writeObject()).
>
More information about the core-libs-dev
mailing list