RFR JDK-8011200 (was 7143928) : (coll) Optimize for Empty ArrayList and HashMap

Wed Apr 10 01:17:23 UTC 2013

Hello all;

Another update to the webrev incorporating feedback from Alan and Martin.

http://cr.openjdk.java.net/~mduigou/JDK-8011200/3/webrev/

- The key change in this revision is that the capacity of both cloned() and deserialized instances is determined by the number of entries NOT the capacity of the source HashMap. As Alan says, we don't know how the clone or deserialized instance is going to be used. Without maintaining the original user specified initial capacity we're possibly better off not allocating excessive bucket table.

Mike

On Apr 6 2013, at 15:02 , Mike Duigou wrote:

> Hello all;
> 
> Another, and hopefully the final, update to the webrev for this issue. The revised webrev is here:
> 
> http://cr.openjdk.java.net/~mduigou/JDK-8011200/1/webrev/
> 
> The important changes in this revision:
> 
> - I've removed the serialData change in HashMap. The implementation now reads the capacity and gracefully handles non-power of 2 values.
> 
> - I'm not entirely convinced that having serialization emulate clone() for capacity handling is the best answer. I might also want to change clone() to size it's result based upon the number of mappings in the source rather its the capacity. Anybody have strong feelings about this to suggest one behaviour is obviously better?
> 
> Any other final thoughts?
> 
> Mike
> 
> On Apr 2 2013, at 15:50 , Mike Duigou wrote:
> 
>> Hello again;
>> 
>> I have updated the patch with the received review feedback. The revised webrev is here:
>> 
>> http://cr.openjdk.java.net/~mduigou/JDK-8011200/1/webrev/
>> 
>> The important changes in this revision:
>> 
>> - The behaviour of the readObject/writeObject serialization for both classes now more closely mirrors the behaviour of clone(). For ArrayList this means that the deserialized list has a capacity the same as the size. ie. as if trimToSize() was called. For HashMap, the opposite is true, the capacity is the same as was in effect when the object was serialized. (HashMap also tries to protect itself from nonsensical/harmful input). The implementation changes to serialization preserve forward and backward compatibility--all serialized objects are compatible with all implementations. I will file a spec change request for the addition of ", a power of 2" to the @serialData tag for this existing but previously unstated requirement.
>> 
>> - Use of Arrays.fill has been reverted. I did change one fill case so that the loop can be optimized. (size field was being updated with each iteration). I very slightly expanded the docs.
>> 
>> This is starting to look like a nice set of changes.
>> 
>> Mike
>> 
>> On Apr 1 2013, at 21:44 , Mike Duigou wrote:
>> 
>>> Hello all;
>>> 
>>> Last night while pushing another changeset I accidentally pushed a changeset for JKD-7143928. Since the review and testing was not complete on this issue I have backed out that changeset and created a new bug number to continue development. The new webrev to complete the review is:
>>> 
>>> http://cr.openjdk.java.net/~mduigou/JDK-8011200/0/webrev/
>>> 
>>> It is currently unchanged from the last posted changeset for 7143928.
>>> 
>>> Mike
>>> 
>>> On Apr 1 2013, at 19:00 , Mike Duigou wrote:
>>> 
>>>> Hello all;
>>>> 
>>>> I have posted an updated version of the empty ArrayList and HashMap patch.
>>>> 
>>>> http://cr.openjdk.java.net/~mduigou/JDK-7143928/1/webrev/
>>>> 
>>>> This revised implementation introduces *no new fields* to either class. For ArrayList the lazy allocation of the backing array occurs only if the list is created at default size. According to our performance analysis team, approximately 85% of ArrayList instances are created at default size so this optimization will be valid for an overwhelming majority of cases. 
>>>> 
>>>> For HashMap, creative use is made of the threshold field to track the requested initial size until the bucket array is needed. On the read side the empty map case is tested with isEmpty(). On the write size a comparison of (table == EMPTY_TABLE) is used to detect the need to inflate the bucket array. In readObject there's a little more work to try to choose an efficient initial capacity.
>>>> 
>>>> Mike
>>>> 
>>>> On Mar 26 2013, at 17:25 , Mike Duigou wrote:
>>>> 
>>>>> Hello all;
>>>>> 
>>>>> This is a review for optimization work that came out of internal analysis of Oracle's Java applications. It's based upon analysis that shows that in large applications as much as 10% of maps and lists are initialized but never receive any entries. A smaller number spend a large proportion of their lifetime empty. We've found similar results across other workloads as well. This patch is not a substitute for pre-sizing your collections and maps--doing so will *always* have better results.
>>>>> 
>>>>> This patch extends HashMap and ArrayList to provide special handling for newly created instances that avoids creating the backing array until needed. There is a very small additional cost for detecting when to inflate the map or list that is measurable in interpreted tests but disappears in JITed code. 
>>>>> 
>>>>> http://cr.openjdk.java.net/~mduigou/JDK-7143928/0/webrev/
>>>>> 
>>>>> We expect that should this code prove successful in Java 8 it will be backported to Java 7 updates.
>>>>> 
>>>>> The unit test may appear to be somewhat unrelated. It was created after resolving a bug in an early version of this patch to detect the issue encountered (LinkedHashMap.init() was not being called in readObject() when the map was empty).
>>>>> 
>>>>> Mike
>>>> 
>>> 
>> 
>