RFR(S): 8015237: Parallelize string table scanning during strong root processing

Tue May 28 08:15:30 UTC 2013

Hi John,

On 2013-05-25 00:19, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have a couple of reviewers look over these changes - the webrev
> is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>
> Summary:
> On some workloads we are seeing that the scan of the intern string table
> (among others) can sometimes take quite a while. This showed up on some
> FMW workloads with G1 where the scan of the string table dominated the
> pause time for some pauses. G1 was particularly affected since it
> doesn't do class unloading (and hence pruning of the string table)
> except at full GCs. The solution was to change the string table from
> being considered a single root task and treat similarly to the Java
> thread stacks: each GC worker claims a given number of buckets and scans
> the entries in those buckets.

A few minor comments/questions:

* How about calculating ClaimChunkSize based on number of buckets that 
fit into a cacheline? Looking at all the buckets you already got nicely 
cached seems tempting. Something like:

ClaimChunkSize = X * DEFAULT_CACHE_LINE_SIZE / 
sizeof(HashtableBucket<mtSymbol>);

Where e.g. X=3 would yield 24 buckets for a release build on x86_64. 
Doing 20 (two and a half cachelines) means you would skip buckets you 
most likely have nicely cached.

* There's a debug printout/comment leftover in possibly_parallel_oops_do().

* It seems that the worker_id argument passed all the way to 
possibly_parallel_oops_do() could be removed, since it's never used.

* Maybe factor out the inner loop and share that between 
possibly_parallel_oops_do() and oops_do()?

/Per