Do Set implementations waste memory?

Thu Mar 18 14:09:20 UTC 2010

... and please consider
Bug 6812862 <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862> 
- provide customizable hash() algorithm in HashMap for speed tuning
again and too for HashSet.

-Ulf


Am 18.03.2010 14:54, schrieb Ulf Zibis:
> +1
>
> -Ulf
>
> Am 18.03.2010 14:22, schrieb Osvaldo Doederlein:
>>
>> The oldest collection classes were designed for the needs of J2SE 
>> 1.2, a full decade ago. This was discussed before, IIRC there was 
>> some reply from Josh agreeing that some speeed-vs-size tradeoffs made 
>> last decade should be revisited today. The extra runtime size/bloat 
>> that a specialized HashSet implementation would cost, was reasonably 
>> significant in 1999 but completely irrelevant in 2010. I mean, 
>> HashSet is a HUGELY important collection, and the benefit of any 
>> optimization of its implementation would spread to many APIs and 
>> applications.
>>
>> And the problem is not only the extra value field, there is also 
>> overhead from the extra indirection (plus extra polymorphic call) 
>> from the HashSet object to the internal HashMap object. This overhead 
>> may sometimes be sufficient to block inlining and devirtualization, 
>> so it's a potentially bigger cost than just a single extra memory 
>> load (which is easily hoisted out of loops etc.). Look at this code 
>> inside HashSet for a much worse cost:
>>
>>     public Iterator<E> iterator() {
>>         return map.keySet().iterator();
>>     }
>>
>> Yeah we pay the cost of building the internal HashMap's key-set 
>> (which is lazily-built), just to iterate the freaking HashSet. 
>> (Notice that differently from HashMap, a Set is a true Collection 
>> that we can iterate directly without any view-collection of 
>> keys/values/entries.)
>>
>> IMHO all this adds evidence that the current HashSet implementation 
>> is a significant performance bug. We need a brand-new impl that does 
>> the hashing internally, without relying on HashMap, without any 
>> unused fields, extra indirections, or surprising costs like that for 
>> iterator(). I guess it would be relatively simple to copy-paste 
>> HashMap's code, cut stuff until just a Set of keys is left, and merge 
>> in the most specific pieces of HashSet (basically just 
>> readObject()/writeObject()).
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100318/6b172e31/attachment.html>