RFR(S): 8015237: Parallelize string table scanning during strong root processing

Wed May 29 09:44:02 UTC 2013

Hi,

On 2013-05-29 09:12, Stefan Karlsson wrote:
> On 05/29/2013 12:53 AM, John Cuthbertson wrote:
>> Hi Per,
>>
>> Thanks for looking at the code. Replies inline...
>>
>> On 5/28/2013 1:15 AM, Per Lidén wrote:
>>> Hi John,
>>>
>>> On 2013-05-25 00:19, John Cuthbertson wrote:
>>>> Hi Everyone,
>>>>
>>>> Can I have a couple of reviewers look over these changes - the webrev
>>>> is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>>>>
>>>> Summary:
>>>> On some workloads we are seeing that the scan of the intern string
>>>> table
>>>> (among others) can sometimes take quite a while. This showed up on some
>>>> FMW workloads with G1 where the scan of the string table dominated the
>>>> pause time for some pauses. G1 was particularly affected since it
>>>> doesn't do class unloading (and hence pruning of the string table)
>>>> except at full GCs. The solution was to change the string table from
>>>> being considered a single root task and treat similarly to the Java
>>>> thread stacks: each GC worker claims a given number of buckets and
>>>> scans
>>>> the entries in those buckets.
>>>
>>> A few minor comments/questions:
>>>
>>> * How about calculating ClaimChunkSize based on number of buckets
>>> that fit into a cacheline? Looking at all the buckets you already got
>>> nicely cached seems tempting. Something like:
>>>
>>> ClaimChunkSize = X * DEFAULT_CACHE_LINE_SIZE /
>>> sizeof(HashtableBucket<mtSymbol>);
>>>
>>> Where e.g. X=3 would yield 24 buckets for a release build on x86_64.
>>> Doing 20 (two and a half cachelines) means you would skip buckets you
>>> most likely have nicely cached.
>>
>> I like this idea.
>
> Don't you have to  cache align  the buckets array to get this to work well?

Good point! Indeed, that needs to be taken into account.

/Per

>
> StefanK
>
>>
>>>
>>> * There's a debug printout/comment leftover in
>>> possibly_parallel_oops_do().
>>
>> Oops. It was to verify the claiming and check the distribution.
>>
>>>
>>> * It seems that the worker_id argument passed all the way to
>>> possibly_parallel_oops_do() could be removed, since it's never used.
>>
>> See my response to Stefan. This is the 3rd time I've had to do such a
>> change (for various reasons). I'll remove it this time but if I have
>> to add it for something else in the future I'll be more insistent that
>> it stays then.
>>
>>>
>>> * Maybe factor out the inner loop and share that between
>>> possibly_parallel_oops_do() and oops_do()?
>>
>> See my response to Thomas.
>>
>> Thanks,
>>
>> JohnC
>