RFR (L) JDK-8195100: Use a low latency hashtable for SymbolTable
Gerard Ziemski
gerard.ziemski at oracle.com
Thu Aug 2 15:26:53 UTC 2018
Thank you Ioi for taking a look, very much appreciated!
New webrev at http://cr.openjdk.java.net/~gziemski/8195100_rev3
> On Aug 1, 2018, at 4:27 PM, Ioi Lam <ioi.lam at oracle.com> wrote:
>
> Hi Gerard,
>
> Have you tried running HelloWorld to get a sense of the start-up impact? E.g.:
>
> perf stat -r 100 ./old/bin/java -cp . HelloWorld > /dev/null
> perf stat -r 100 ./new/bin/java -cp . HelloWorld > /dev/null
I have benchmarked it using our internal dev-submit (details in JDK-8208142), which shows up to 5% regression in simple startup benchmarks, but have not used Linux “perf”, and only briefly used Xcode’s “Instruments” to analyze it. I will be doing all of the performance work as part of JDK-8208142 (which I already started to work on)
>
> I've looked at all the files except for the GC files. Here are my comments:
>
>
> symbolTable.hpp:
>
> 87 // The symbol table holds all Symbol*s and corresponding interned strings.
> 88 // Symbol*s and literal strings should be canonicalized.
> 89 //
> 90 // The interned strings are created lazily.
> 91 //
> 92 // %note:
> 93 // - symbolTableEntrys are allocated in blocks to reduce the space overhead.
> 94 template <class T, class N> class CompactHashtable;
> 95 class CompactSymbolTableWriter;
> 96 class SerializeClosure;
> 97
> 98 class SymbolTable;
> 99 class SymbolTableConfig;
> 100 typedef ConcurrentHashTable<Symbol*,
> 101 SymbolTableConfig, mtSymbol> SymbolTableHash;
> 102
> 103 class SymbolTableCreateEntry;
> 104
> 105 class SymbolTable : public CHeapObj<mtSymbol> {
>
> Line 98 can be deleted.
Done.
>
> Lines 87~93 seem to be out of date and should be deleted:
> - The fact that SymbolTable holds symbols should be obvious. Also, it
> no longer holds all the symbols -- there's also the shared symbol table.
> - Interned strings are not stored here.
> - There's no definition of what "canonicalized" is.
> - symbolTableEntrys no longer exists.
Done.
>
> symbolTable.cpp:
>
> 342 // Otherwise, add the symbol to table. Copy to a C string first.
> 343 char stack_buf[ON_STACK_BUFFER_LENGTH];
> 344 ResourceMark rm(THREAD);
> 345 if (len <= ON_STACK_BUFFER_LENGTH) {
> 346 buffer = stack_buf;
> 347 } else {
> 348 buffer = NEW_RESOURCE_ARRAY_IN_THREAD(THREAD, char, len);
> 349 }
> 350 for (int i=0; i<len; i++) {
> 351 buffer[i] = name[i];
> 352 }
> 353 // Make sure there is no safepoint in the code above since name can't move.
> 354 // We can't include the code in NoSafepointVerifier because of the
> 355 // ResourceMark.
>
> The above is dead code and should be removed.
That’s not dead code, but in fact you did find an issue here - we should be using the “buffer”, not “name” when we add to the table. Thanks!
>
> 429 // Suggestion: Push unicode-based lookup all the way into the hashing
> 430 // and probing logic, so there is no need for convert_to_utf8 until
> 431 // an actual new Symbol* is created.
> 432 Symbol* SymbolTable::lookup_unicode(const jchar* name, int utf16_length, TRAPS) {
>
> Do you have statistics of how often this function is called? I suspect it's
> pretty low for common apps.
>
> 311 Symbol* SymbolTable::lookup(const char* name, int len, TRAPS) {
> 312 unsigned int hash = hash_symbol(name, len, SymbolTable::_alt_hash);
> 313 Symbol* sym = SymbolTable::the_table()->lookup_common(name, len, hash);
> 314 if (NULL == sym) {
> 315 sym = SymbolTable::the_table()->do_add_if_needed((char*)name, len, hash, true, CHECK_NULL);
> 316 }
> ...
> 463 void SymbolTable::add(ClassLoaderData* loader_data, const constantPoolHandle& cp,
> 464 int names_count, const char** names, int* lengths,
> 465 int* cp_indices, unsigned int* hashValues, TRAPS) {
> ...
> 471 Symbol* sym = SymbolTable::the_table()->lookup_common(name, len, hash);
> 472 if (NULL == sym) {
> 473 sym = SymbolTable::the_table()->do_add_if_needed(name, len, hash, c_heap, CHECK);
> 474 }
>
>
> Is it possible to combine the lookup_common and do_add_if_needed into a single operation,
> so you can avoid doing the search twice?
Very possible, and in fact I have already done this as part as JDK-8208142.
>
> 647 // Sharing
> 648 #if INCLUDE_CDS
>
> 647 can be deleted as it's superfluous.
>
Done.
Thank you!
>
> Thanks
> - Ioi
>
>
>
>
> On 7/30/18 9:19 AM, Gerard Ziemski wrote:
>> Please review this Enhancement, which uses the new concurrent hash table for SymbolTable.
>>
>> This is an effort similar to the one behind JDK-8195097 "Make it possible to process StringTable outside safepoint” from a while ago.
>>
>> The main expected goal here is to eliminate safepoint pauses needed to cleanup the table. This goal was achieved by using “Service Thread” to do the cleaning. Checking whether we need to clean is performed on “VM Thread”, after class unloading (we check the entire table). We also check the bucket into which we happen to be inserting a new item.
>>
>> A few things to note:
>>
>> - The SymbolTable implementation follows closely that of StringTable - we might be able to factor out common code later
>>
>> - There are a few small, but statistically significant, regressions in startup benchmarks (around 2-5%) that will be addressed later, tracked by JDK-8208142
>>
>> - There is an outstanding question about whether we can safely walk the table during a safepoint using do_scan, but without locking, tracked by JDK-8208462
>>
>> - There is a cleanup opportunity presented now to remove rehashable hash table, tracked by JDK-8208519
>>
>> - There is a new test that validates that we free dead entries when we insert new symbols (using short lived symbols via Class.forName() API)
>>
>>
>> Tested using Mach5 hs-tier1,2,3,4,5 (final test running right now…)
>>
>> Webrev: http://cr.openjdk.java.net/~gziemski/8195100_rev1/
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8195100
>>
>>
>> Cheers
>>
>
More information about the hotspot-runtime-dev
mailing list