RFR(S): 8015237: Parallelize string table scanning during strong root processing
Stefan Karlsson
stefan.karlsson at oracle.com
Wed Jun 5 18:51:36 UTC 2013
On 6/5/13 8:34 PM, John Cuthbertson wrote:
> Hi Stefan,
>
> I think it was for aesthetics. 20 looked odd (even though it's even) :)
>
> I went with 32 because a lower number would increase the number of
> atomic operations. 128 seemed too high for the old default table size
> on the haswell system I was testing on. So 32 and 64 seemed to be the
> Goldilocks values. I chose 32.
OK. Thanks.
StefanK
>
> JohnC
>
> On 6/5/2013 11:03 AM, Stefan Karlsson wrote:
>> On 6/5/13 7:05 PM, John Cuthbertson wrote:
>>> Hi Stefan,
>>>
>>> He wanted a power of two.
>>
>> And the reason being?
>>
>> It won't help the cache alignment "issues" that Per talked about
>> earlier, if that's the reason. The array will still not be aligned
>> against a cache line size.
>>
>> Anyways, I'm happy with the change.
>>
>> StefanK
>>>
>>> JohnC
>>>
>>> On 6/5/2013 4:10 AM, Stefan Karlsson wrote:
>>>> On 06/04/2013 09:00 PM, John Cuthbertson wrote:
>>>>> Hi Everyone,
>>>>>
>>>>> Here's a new webrev for this change:
>>>>> http://cr.openjdk.java.net/~johnc/8015237/webrev.1
>>>>
>>>> Looks good. Thanks for doing all the cleanups.
>>>>
>>>>>
>>>>> Changes from before:
>>>>> * Refactored the code that loops over the buckets into its own
>>>>> routine.
>>>>> * Removed the commented out instrumentation (oops).
>>>>> * Changed the types to int to be consistent with the rest of
>>>>> symbolTable and allow removal of the casts.
>>>>> * Increase the number of buckets per claim to 32 based upon a
>>>>> verbal comment from John Coomes.
>>>>
>>>> Care to describe the reasoning why 32 should be better?
>>>>
>>>>> * Removed the additional worker ID parameter for the sake of peace
>>>>> and harmony.
>>>>
>>>> Thanks.
>>>>
>>>> StefanK
>>>>
>>>>>
>>>>> Testing: jprt.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> JohnC
>>>>>
>>>>> On 5/24/2013 3:19 PM, John Cuthbertson wrote:
>>>>>> Hi Everyone,
>>>>>>
>>>>>> Can I have a couple of reviewers look over these changes - the
>>>>>> webrev is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>>>>>>
>>>>>> Summary:
>>>>>> On some workloads we are seeing that the scan of the intern
>>>>>> string table (among others) can sometimes take quite a while.
>>>>>> This showed up on some FMW workloads with G1 where the scan of
>>>>>> the string table dominated the pause time for some pauses. G1 was
>>>>>> particularly affected since it doesn't do class unloading (and
>>>>>> hence pruning of the string table) except at full GCs. The
>>>>>> solution was to change the string table from being considered a
>>>>>> single root task and treat similarly to the Java thread stacks:
>>>>>> each GC worker claims a given number of buckets and scans the
>>>>>> entries in those buckets.
>>>>>>
>>>>>> Testing
>>>>>> Kitchensink; jprt; GC test suite. With all collectors.
>>>>>>
>>>>>> Overhead:
>>>>>> Not real performance numbers but I did some measurement of the
>>>>>> synchronization overhead of using 1 GC worker thread. They are
>>>>>> summarized here:
>>>>>>
>>>>>>
>>>>>> 0-threads (ms)
>>>>>> 1-thread-chunked (ms)
>>>>>> Min 0.200
>>>>>> 0.300
>>>>>> Max 6.900
>>>>>> 8.800
>>>>>> Avg 0.658
>>>>>> 0.794
>>>>>>
>>>>>>
>>>>>> These were from 1 hour long runs of Kitchensink with around ~2800
>>>>>> GCs in each run.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> JohnC
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130605/2e6af83b/attachment.htm>
More information about the hotspot-gc-dev
mailing list