RFR(S): 8015237: Parallelize string table scanning during strong root processing

Stefan Karlsson stefan.karlsson at oracle.com
Wed Jun 5 18:51:36 UTC 2013


On 6/5/13 8:34 PM, John Cuthbertson wrote:
> Hi Stefan,
>
> I think it was for aesthetics. 20 looked odd (even though it's even) :)
>
> I went with 32 because a lower number would increase the number of 
> atomic operations. 128 seemed too high for the old default table size 
> on the haswell system I was testing on. So 32 and 64 seemed to be the 
> Goldilocks values. I chose 32.

OK. Thanks.

StefanK

>
> JohnC
>
> On 6/5/2013 11:03 AM, Stefan Karlsson wrote:
>> On 6/5/13 7:05 PM, John Cuthbertson wrote:
>>> Hi Stefan,
>>>
>>> He wanted a power of two.
>>
>> And the reason being?
>>
>> It won't help the cache alignment "issues" that Per talked about 
>> earlier, if that's the reason. The array will still not be aligned 
>> against a cache line size.
>>
>> Anyways, I'm happy with the change.
>>
>> StefanK
>>>
>>> JohnC
>>>
>>> On 6/5/2013 4:10 AM, Stefan Karlsson wrote:
>>>> On 06/04/2013 09:00 PM, John Cuthbertson wrote:
>>>>> Hi Everyone,
>>>>>
>>>>> Here's a new webrev for this change: 
>>>>> http://cr.openjdk.java.net/~johnc/8015237/webrev.1
>>>>
>>>> Looks good. Thanks for doing all the cleanups.
>>>>
>>>>>
>>>>> Changes from before:
>>>>> * Refactored the code that loops over the buckets into its own 
>>>>> routine.
>>>>> * Removed the commented out instrumentation (oops).
>>>>> * Changed the types to int to be consistent with the rest of 
>>>>> symbolTable and allow removal of the casts.
>>>>> * Increase the number of buckets per claim to 32 based upon a 
>>>>> verbal comment from John Coomes.
>>>>
>>>> Care to describe the reasoning why 32 should be better?
>>>>
>>>>> * Removed the additional worker ID parameter for the sake of peace 
>>>>> and harmony.
>>>>
>>>> Thanks.
>>>>
>>>> StefanK
>>>>
>>>>>
>>>>> Testing: jprt.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> JohnC
>>>>>
>>>>> On 5/24/2013 3:19 PM, John Cuthbertson wrote:
>>>>>> Hi Everyone,
>>>>>>
>>>>>> Can I have a couple of reviewers look over these changes - the 
>>>>>> webrev is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>>>>>>
>>>>>> Summary:
>>>>>> On some workloads we are seeing that the scan of the intern 
>>>>>> string table (among others) can sometimes take quite a while. 
>>>>>> This showed up on some FMW workloads with G1 where the scan of 
>>>>>> the string table dominated the pause time for some pauses. G1 was 
>>>>>> particularly affected since it doesn't do class unloading (and 
>>>>>> hence pruning of the string table) except at full GCs. The 
>>>>>> solution was to change the string table from being considered a 
>>>>>> single root task and treat similarly to the Java thread stacks: 
>>>>>> each GC worker claims a given number of buckets and scans the 
>>>>>> entries in those buckets.
>>>>>>
>>>>>> Testing
>>>>>> Kitchensink; jprt; GC test suite. With all collectors.
>>>>>>
>>>>>> Overhead:
>>>>>> Not real performance numbers but I did some measurement of the 
>>>>>> synchronization overhead of using 1 GC worker thread. They are 
>>>>>> summarized here:
>>>>>>
>>>>>>
>>>>>> 	0-threads (ms)
>>>>>> 	1-thread-chunked (ms)
>>>>>> Min 	0.200
>>>>>> 	0.300
>>>>>> Max 	6.900
>>>>>> 	8.800
>>>>>> Avg 	0.658
>>>>>> 	0.794
>>>>>>
>>>>>>
>>>>>> These were from 1 hour long runs of Kitchensink with around ~2800 
>>>>>> GCs in each run.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> JohnC
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130605/2e6af83b/attachment.htm>


More information about the hotspot-gc-dev mailing list