RFR(S): 8015237: Parallelize string table scanning during strong root processing

Wed Jun 5 18:03:24 UTC 2013

On 6/5/13 7:05 PM, John Cuthbertson wrote:
> Hi Stefan,
>
> He wanted a power of two.

And the reason being?

It won't help the cache alignment "issues" that Per talked about 
earlier, if that's the reason. The array will still not be aligned 
against a cache line size.

Anyways, I'm happy with the change.

StefanK
>
> JohnC
>
> On 6/5/2013 4:10 AM, Stefan Karlsson wrote:
>> On 06/04/2013 09:00 PM, John Cuthbertson wrote:
>>> Hi Everyone,
>>>
>>> Here's a new webrev for this change: 
>>> http://cr.openjdk.java.net/~johnc/8015237/webrev.1
>>
>> Looks good. Thanks for doing all the cleanups.
>>
>>>
>>> Changes from before:
>>> * Refactored the code that loops over the buckets into its own routine.
>>> * Removed the commented out instrumentation (oops).
>>> * Changed the types to int to be consistent with the rest of 
>>> symbolTable and allow removal of the casts.
>>> * Increase the number of buckets per claim to 32 based upon a verbal 
>>> comment from John Coomes.
>>
>> Care to describe the reasoning why 32 should be better?
>>
>>> * Removed the additional worker ID parameter for the sake of peace 
>>> and harmony.
>>
>> Thanks.
>>
>> StefanK
>>
>>>
>>> Testing: jprt.
>>>
>>> Thanks,
>>>
>>> JohnC
>>>
>>> On 5/24/2013 3:19 PM, John Cuthbertson wrote:
>>>> Hi Everyone,
>>>>
>>>> Can I have a couple of reviewers look over these changes - the 
>>>> webrev is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>>>>
>>>> Summary:
>>>> On some workloads we are seeing that the scan of the intern string 
>>>> table (among others) can sometimes take quite a while. This showed 
>>>> up on some FMW workloads with G1 where the scan of the string table 
>>>> dominated the pause time for some pauses. G1 was particularly 
>>>> affected since it doesn't do class unloading (and hence pruning of 
>>>> the string table) except at full GCs. The solution was to change 
>>>> the string table from being considered a single root task and treat 
>>>> similarly to the Java thread stacks: each GC worker claims a given 
>>>> number of buckets and scans the entries in those buckets.
>>>>
>>>> Testing
>>>> Kitchensink; jprt; GC test suite. With all collectors.
>>>>
>>>> Overhead:
>>>> Not real performance numbers but I did some measurement of the 
>>>> synchronization overhead of using 1 GC worker thread. They are 
>>>> summarized here:
>>>>
>>>>
>>>> 	0-threads (ms)
>>>> 	1-thread-chunked (ms)
>>>> Min 	0.200
>>>> 	0.300
>>>> Max 	6.900
>>>> 	8.800
>>>> Avg 	0.658
>>>> 	0.794
>>>>
>>>>
>>>> These were from 1 hour long runs of Kitchensink with around ~2800 
>>>> GCs in each run.
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130605/f13abf90/attachment.htm>