RFR(S): 8015237: Parallelize string table scanning during strong root processing

Mon May 27 08:04:17 UTC 2013

Hi John,

On Fri, 2013-05-24 at 15:36 -0700, John Cuthbertson wrote:
> Hi Everyone,
>
> On 5/24/2013 3:19 PM, John Cuthbertson wrote:
>
> > Hi Everyone,
>>
> > Can I have a couple of reviewers look over these changes - the
> > webrev is: http://cr.openjdk.java.net/~johnc/8015237/webrev.0/
>>
> > Summary:
> > On some workloads we are seeing that the scan of the intern string
> > table (among others) can sometimes take quite a while. This showed
> > up on some FMW workloads with G1 where the scan of the string table
> > dominated the pause time for some pauses. G1 was particularly
> > affected since it doesn't do class unloading (and hence pruning of
> > the string table) except at full GCs. The solution was to change the
> > string table from being considered a single root task and treat
> > similarly to the Java thread stacks: each GC worker claims a given
> > number of buckets and scans the entries in those buckets.
>>
> > Testing
> > Kitchensink; jprt; GC test suite. With all collectors.

Looks good. A few minor nits:

- in StringTable::possibly_parallel_oops_do() there is some commented
debugging code left.
- one suggestion for refactoring: the worker loop for
StringTable::possibly_parallel_oops_do() and ::oops_do() are the same,
except for the difference in the bounds for that loop.

So maybe this could be factored out into a helper routine (e.g.

StringTable::process_entries(OopClosure * f, uint start_idx, uint
end_idx) {
  for (uint i = start_idx; i < end_idx; i += 1) {
    HashtableEntry<oop, mtSymbol>** p = the_table()->bucket_addr(i);
    HashtableEntry<oop, mtSymbol>* entry = the_table()->bucket(i);

    while (entry != NULL) {
      f->do_oop((oop*)entry->literal_addr());

      // Did the closure remove the literal from the table?
      if (entry->literal() == NULL) {
        assert(!entry->is_shared(), "immutable hashtable entry?");
        *p = entry->next();
        the_table()->free_entry(entry);
      } else {
        p = entry->next_addr();
      }
      entry = (HashtableEntry<oop, mtSymbol>*)HashtableEntry<oop,
mtSymbol>::make_ptr(*p);
    }
  }
}

and called by both StringTable::possibly_parallel_oops_do()
and ::oops_do() with appropriate parameters?

(maybe find a better name for that method :)

- in SharedHeap.cpp:163 there is a misplaced colon in the call to
Threads::possibly_parallel_oops_do(); this has been the case before as
well, so it's not really an issue.

>>
> > Overhead:
> > Not real performance numbers but I did some measurement of the
> > synchronization overhead of using 1 GC worker thread. They are
> > summarized here:
>>
>>
> > 
> > 
> > 0-threads (ms)
> > 
> > 1-thread-chunked (ms)
> > 
> > Min
> > 0.200
> > 
> > 0.300
> > 
> > Max
> > 6.900
> > 
> > 8.800
> > 
> > Avg
> > 0.658
> > 
> > 0.794
> > 
> > 
> > These were from 1 hour long runs of Kitchensink with around ~2800
> > GCs in each run.

- this is just a suggestion: if this is a problem (it does not seem so),
and of course it also depends on the size of the table (the number of
synchronizations), why not make the size of the chunks a function of the
number of threads and the size of the string table?

Thomas