RFR (M): 8027476: Improve performance of Stringtable unlink, 8027455: Improve symbol table scan times during gc pauses

Thu Jan 16 10:14:33 PST 2014

Thomas,

In symbolTable.cpp shouldn't this be buckets_unlink_or_oops_do() ?

+void StringTable::buckets_unlink_or_do(BoolObjectClosure* is_alive, OopClosure* f, int start_idx, int end_idx, int* processed, int* removed) {

Actually, I see where you got this.  StringTable::buckets_do should be 
buckets_oops_do too.

The symbolTable.cpp/hpp changes look fine.  The code for SymbolTable and 
StringTable is more similar now but there isn't enough common code to 
add to the superclass HashTable<>.   We should separate these into their 
own files later.  I only clicked on the gc files, but you have reviews 
for these.

Thanks for waiting for my review,
Coleen

On 01/13/2014 06:36 AM, Thomas Schatzl wrote:
> Hi Mikael,
>
> On Fri, 2014-01-10 at 17:24 +0100, Mikael Gerdin wrote:
>> Thomas,
>>
>> Overall I think the change is fine.
>> I have some comments inline,
> Thanks for the review :)
>
>> On Friday 10 January 2014 10.27.49 Thomas Schatzl wrote:
>>> Hi all,
>>>
>>>    could I have reviews for the following performance change?
>>>
>>> It implements parallelization of string table and symbol table scan
>>> during G1 full GC and remark.
>>>
>>> This decreases the respective string table/symbol table scan times
>>> almost linearly depending on the number of threads used.
>>>
>>> The change is quite straightforward, similar to string table scan during
>>> initial mark there are new possibly_parallel_unlink() methods for both
>>> string and symbol table, that are later used in a new taskset.
>>>
>>> Task distribution works as before, using a single claim value for each
>>> of the tables (reusing the one for the string table) on the hash table
>>> array.
>>>
>>> The impact on total remark/full gc time is significant, in conjunction
>>> with JDK-8027454 (RFR coming soon) remark time is halved.
>>>
>>> I decided to merge these two fixes into a single webrev because these
>>> two things are closely related, and the shared code between the two
>>> issues is very large, and the actual code to implement one or the other
>>> is small and easily understood (imo). Splitting would add lots of code
>>> that would first need to be added and then removed again too.
>>>
>>> I filed two follow-up CRs to use these new methods in the other parallel
>>> collectors too (JDK-8031473), and another one for fine-tuning
>>> (JDK-8031472) some parameters.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8027476/webrev
>>
>> -  // Now clean up stale oops in StringTable
>> -  StringTable::unlink(&g1_is_alive);
>> -  // Clean up unreferenced symbols in symbol table.
>> -  SymbolTable::unlink();
>> +  g1h->unlink_string_and_symbol_table(&g1_is_alive);
>>
>> Why did we call StringTable::unlink here?
>> AFAICT G1's initial mark calls process_strong_roots with SO_Strings with the
>> G1ParScanAndMarkExtRootClosure, which in turn will make all strings which are
>> in the StringTable at init mark to be live. Have you ever seen any strings
>> being unloaded at this point? ( See
>> https://bugs.openjdk.java.net/browse/JDK-8015332 and
>> http://openjdk.java.net/jeps/156 )
> No, that's impossible. I kept it in this place because I wanted the
> change separate.
>
>> +  ~G1StringSymbolTableUnlinkTask() {
>> +    guarantee(!_process_strings || StringTable::parallel_claimed_index() >=
>> _initial_string_table_size,
>> +              err_msg("claim value "SIZE_FORMAT" after unlink less than
>> initial string table size "SIZE_FORMAT,
>> +                      StringTable::parallel_claimed_index(),
>> _initial_string_table_size));
>> +  }
>>
>> Why do you only check the string table size here and not the symbol table?
> This is I think a leftover from the initial code: I initially checked
> that the number of processed entries was equal to what the table
> contained. This does not work for the symbol table, as in case of CDS we
> can terminate the loop over the buckets early.
>
> I fixed this (added code) and incorporated Bengt's suggestions in
> http://cr.openjdk.java.net/~tschatzl/8027454/webrev.1/
>
> I also changed several size_t variables to int because the underlying
> hashmap only supports int's for sizes anyway, and just requires too much
> casting around with no gain.
>
> Thanks a lot for looking over the change,
> Thomas
>
>