RFR(S): 8015237: Parallelize string table scanning during strong root processing

John Cuthbertson john.cuthbertson at oracle.com
Wed Jun 5 17:05:12 UTC 2013


Hi Stefan,

He wanted a power of two.

JohnC

On 6/5/2013 4:10 AM, Stefan Karlsson wrote:
> On 06/04/2013 09:00 PM, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> Here's a new webrev for this change: 
>> http://cr.openjdk.java.net/~johnc/8015237/webrev.1
>
> Looks good. Thanks for doing all the cleanups.
>
>>
>> Changes from before:
>> * Refactored the code that loops over the buckets into its own routine.
>> * Removed the commented out instrumentation (oops).
>> * Changed the types to int to be consistent with the rest of 
>> symbolTable and allow removal of the casts.
>> * Increase the number of buckets per claim to 32 based upon a verbal 
>> comment from John Coomes.
>
> Care to describe the reasoning why 32 should be better?
>
>> * Removed the additional worker ID parameter for the sake of peace 
>> and harmony.
>
> Thanks.
>
> StefanK
>
>>
>> Testing: jprt.
>>
>> Thanks,
>>
>> JohnC
>>
>> On 5/24/2013 3:19 PM, John Cuthbertson wrote:
>>> Hi Everyone,
>>>
>>> Can I have a couple of reviewers look over these changes - the 
>>> webrev is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>>>
>>> Summary:
>>> On some workloads we are seeing that the scan of the intern string 
>>> table (among others) can sometimes take quite a while. This showed 
>>> up on some FMW workloads with G1 where the scan of the string table 
>>> dominated the pause time for some pauses. G1 was particularly 
>>> affected since it doesn't do class unloading (and hence pruning of 
>>> the string table) except at full GCs. The solution was to change the 
>>> string table from being considered a single root task and treat 
>>> similarly to the Java thread stacks: each GC worker claims a given 
>>> number of buckets and scans the entries in those buckets.
>>>
>>> Testing
>>> Kitchensink; jprt; GC test suite. With all collectors.
>>>
>>> Overhead:
>>> Not real performance numbers but I did some measurement of the 
>>> synchronization overhead of using 1 GC worker thread. They are 
>>> summarized here:
>>>
>>>
>>> 	0-threads (ms)
>>> 	1-thread-chunked (ms)
>>> Min 	0.200
>>> 	0.300
>>> Max 	6.900
>>> 	8.800
>>> Avg 	0.658
>>> 	0.794
>>>
>>>
>>> These were from 1 hour long runs of Kitchensink with around ~2800 
>>> GCs in each run.
>>>
>>> Thanks,
>>>
>>> JohnC
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130605/718144b9/attachment.htm>


More information about the hotspot-gc-dev mailing list