RFR(S): 8015237: Parallelize string table scanning during strong root processing

John Cuthbertson john.cuthbertson at oracle.com
Wed Jun 5 18:34:15 UTC 2013


Hi Stefan,

I think it was for aesthetics. 20 looked odd (even though it's even) :)

I went with 32 because a lower number would increase the number of 
atomic operations. 128 seemed too high for the old default table size on 
the haswell system I was testing on. So 32 and 64 seemed to be the 
Goldilocks values. I chose 32.

JohnC

On 6/5/2013 11:03 AM, Stefan Karlsson wrote:
> On 6/5/13 7:05 PM, John Cuthbertson wrote:
>> Hi Stefan,
>>
>> He wanted a power of two.
>
> And the reason being?
>
> It won't help the cache alignment "issues" that Per talked about 
> earlier, if that's the reason. The array will still not be aligned 
> against a cache line size.
>
> Anyways, I'm happy with the change.
>
> StefanK
>>
>> JohnC
>>
>> On 6/5/2013 4:10 AM, Stefan Karlsson wrote:
>>> On 06/04/2013 09:00 PM, John Cuthbertson wrote:
>>>> Hi Everyone,
>>>>
>>>> Here's a new webrev for this change: 
>>>> http://cr.openjdk.java.net/~johnc/8015237/webrev.1
>>>
>>> Looks good. Thanks for doing all the cleanups.
>>>
>>>>
>>>> Changes from before:
>>>> * Refactored the code that loops over the buckets into its own routine.
>>>> * Removed the commented out instrumentation (oops).
>>>> * Changed the types to int to be consistent with the rest of 
>>>> symbolTable and allow removal of the casts.
>>>> * Increase the number of buckets per claim to 32 based upon a 
>>>> verbal comment from John Coomes.
>>>
>>> Care to describe the reasoning why 32 should be better?
>>>
>>>> * Removed the additional worker ID parameter for the sake of peace 
>>>> and harmony.
>>>
>>> Thanks.
>>>
>>> StefanK
>>>
>>>>
>>>> Testing: jprt.
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>>>
>>>> On 5/24/2013 3:19 PM, John Cuthbertson wrote:
>>>>> Hi Everyone,
>>>>>
>>>>> Can I have a couple of reviewers look over these changes - the 
>>>>> webrev is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>>>>>
>>>>> Summary:
>>>>> On some workloads we are seeing that the scan of the intern string 
>>>>> table (among others) can sometimes take quite a while. This showed 
>>>>> up on some FMW workloads with G1 where the scan of the string 
>>>>> table dominated the pause time for some pauses. G1 was 
>>>>> particularly affected since it doesn't do class unloading (and 
>>>>> hence pruning of the string table) except at full GCs. The 
>>>>> solution was to change the string table from being considered a 
>>>>> single root task and treat similarly to the Java thread stacks: 
>>>>> each GC worker claims a given number of buckets and scans the 
>>>>> entries in those buckets.
>>>>>
>>>>> Testing
>>>>> Kitchensink; jprt; GC test suite. With all collectors.
>>>>>
>>>>> Overhead:
>>>>> Not real performance numbers but I did some measurement of the 
>>>>> synchronization overhead of using 1 GC worker thread. They are 
>>>>> summarized here:
>>>>>
>>>>>
>>>>> 	0-threads (ms)
>>>>> 	1-thread-chunked (ms)
>>>>> Min 	0.200
>>>>> 	0.300
>>>>> Max 	6.900
>>>>> 	8.800
>>>>> Avg 	0.658
>>>>> 	0.794
>>>>>
>>>>>
>>>>> These were from 1 hour long runs of Kitchensink with around ~2800 
>>>>> GCs in each run.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> JohnC
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130605/509ec9c7/attachment.htm>


More information about the hotspot-gc-dev mailing list