RFR(S): 8015237: Parallelize string table scanning during strong root processing

Tue May 28 22:31:34 UTC 2013

Hi Thomas,

Thanks for taking a look. Replies inline....

On 5/27/2013 1:04 AM, Thomas Schatzl wrote:
> Hi John,
>
> On Fri, 2013-05-24 at 15:36 -0700, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> On 5/24/2013 3:19 PM, John Cuthbertson wrote:
>>
>>> Hi Everyone,
>>>
>>> Can I have a couple of reviewers look over these changes - the
>>> webrev is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>>>
>>> Summary:
>>> On some workloads we are seeing that the scan of the intern string
>>> table (among others) can sometimes take quite a while. This showed
>>> up on some FMW workloads with G1 where the scan of the string table
>>> dominated the pause time for some pauses. G1 was particularly
>>> affected since it doesn't do class unloading (and hence pruning of
>>> the string table) except at full GCs. The solution was to change the
>>> string table from being considered a single root task and treat
>>> similarly to the Java thread stacks: each GC worker claims a given
>>> number of buckets and scans the entries in those buckets.
>>>
>>> Testing
>>> Kitchensink; jprt; GC test suite. With all collectors.
> Looks good. A few minor nits:
>
> - in StringTable::possibly_parallel_oops_do() there is some commented
> debugging code left.
> - one suggestion for refactoring: the worker loop for
> StringTable::possibly_parallel_oops_do() and ::oops_do() are the same,
> except for the difference in the bounds for that loop.
>
> So maybe this could be factored out into a helper routine (e.g.
>
> StringTable::process_entries(OopClosure * f, uint start_idx, uint
> end_idx) {
>    for (uint i = start_idx; i < end_idx; i += 1) {
>      HashtableEntry<oop, mtSymbol>** p = the_table()->bucket_addr(i);
>      HashtableEntry<oop, mtSymbol>* entry = the_table()->bucket(i);
>
>      while (entry != NULL) {
>        f->do_oop((oop*)entry->literal_addr());
>
>        // Did the closure remove the literal from the table?
>        if (entry->literal() == NULL) {
>          assert(!entry->is_shared(), "immutable hashtable entry?");
>          *p = entry->next();
>          the_table()->free_entry(entry);
>        } else {
>          p = entry->next_addr();
>        }
>        entry = (HashtableEntry<oop, mtSymbol>*)HashtableEntry<oop,
> mtSymbol>::make_ptr(*p);
>      }
>    }
> }
>
> and called by both StringTable::possibly_parallel_oops_do()
> and ::oops_do() with appropriate parameters?

That's a good idea. A better name might be buckets_do, process_buckets, 
oops_buckets_do (as in oops_interpreted_frame_do).

>
> (maybe find a better name for that method :)
>
> - in SharedHeap.cpp:163 there is a misplaced colon in the call to
> Threads::possibly_parallel_oops_do(); this has been the case before as
> well, so it's not really an issue.

OK. Sure I can clean that up.

> - this is just a suggestion: if this is a problem (it does not seem so),
> and of course it also depends on the size of the table (the number of
> synchronizations), why not make the size of the chunks a function of the
> number of threads and the size of the string table?

I thought about this but decided to leave it for another CR. I actually 
like Per's idea of basing the number of buckets on the value of 
CACHE_LINE_SIZE so will probably go with that.

Expect a new webrev soon.

JohnC