RFR (L) JDK-8195100: Use a low latency hashtable for SymbolTable

Gerard Ziemski gerard.ziemski at oracle.com
Wed Aug 8 18:56:09 UTC 2018


hi Kim,

Thank you again for more of your feedback.


> On Aug 6, 2018, at 9:18 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
>> On Aug 6, 2018, at 12:39 PM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
>> 
>> Thank you Kim very much for such thorough and detailed feedback!
>> 
>> New webrev at http://cr.openjdk.java.net/~gziemski/8195100_rev5
>> 
>> rev5 vs rev4 at http://cr.openjdk.java.net/~gziemski/8195100_rev5_4
> 
> Replies and further comments inline.
> 
> (And thank you for the incremental webrev.)
> 
>>> On Aug 2, 2018, at 6:16 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>> src/hotspot/share/classfile/symbolTable.cpp
>>> +#define PREF_AVG_LIST_LEN           8
>>> +#define END_SIZE                    17
>>> +#define REHASH_LEN                  100
>>> +#define CLEAN_DEAD_HIGH_WATER_MARK  0.0
>>> +#define ON_STACK_BUFFER_LENGTH 128
>>> 
>>> Why are these macros, rather than constants?
>> 
>> There are pro and cons for each and we could also consider "enum {PREF_AVG_LIST_LEN = 8;}” here.
>> 
>> I copied the code from stringTable.cpp and I’m not sure I see a really good reason to change both files. Do you have a strong opinion here?
> 
> Using macros for constants is commonly considered poor style. The
> HotSpot style guide says "You can almost always use an inline function
> or class instead of a macro. Use a macro only when you really need
> it." While this doesn't explicitly mention object macros, the second
> sentence certainly applies.
> 
> Using enums is not a good alternative: JDK-8153116.
> 
> That said, I won't object to making this a cleanup item for later,
> esp. since it also applies to StringTable.

Fixed. Since enums are out of the question (I agree with JDK-8153116), I went ahead and fixed this (I already touched stringTable.cpp anyhow).

Also changed 2 “enums” used in symbolTable into constants.
> 

>>> src/hotspot/share/classfile/symbolTable.cpp
>>> +static inline void _log_trace_symboltable_helper(Symbol* sym, const char* msg) {
>>> +#ifndef PRODUCT
>>> +  if (log_is_enabled(Trace, symboltable)) {
>>> +    char *s;
>>> +    {
>>> +      ResourceMark rm;
>>> +      s = sym->as_quoted_ascii();
>>> +      s = os::strdup(s);
>>> +    }
>>> +    if (s == NULL) {
>>> +      log_trace(symboltable)("%s [%s]", msg, "NULL");
>>> +    } else {
>>> +      log_trace(symboltable)("%s [%s]", msg, s);
>>> +      os::free(s);
>>> +    }
>>> +  }
>>> +#endif // PRODUCT
>>> +}
>>> 
>>> Why the strdup/free here?  Just log the result of as_quoted_ascii
>>> directly (assuming it's not NULL).
>> 
>> Fixed.
> 
> Lost in the rewrite was the necessary ResourceMark for as_quoted_ascii.
> 
> Also, the function name shouldn't have a leading underscore, per style
> guide. 

Fixed.


> 
>>> src/hotspot/share/classfile/symbolTable.cpp 
>>> +class SymbolTableConfig : public SymbolTableHash::BaseConfig {
>>> 
>>> I think there's an underlying design bug here.  But I guess go with
>>> what you've got, and I may suggest some changes later, in conjunction
>>> with changes to ConcurrentHashTable.
>> 
>> OK, you will file a followup issue?
> 
> Yes.
> 
>>> src/hotspot/share/classfile/symbolTable.cpp 
>>> +static size_t ceil_pow_2(uintx value) {
>>> +  size_t ret;
>>> +  for (ret = 1; ((size_t)1 << ret) < value; ++ret);
>>> +  return ret;
>>> +}
>>> 
>>> This is poorly named; it's really a variant of integer log2.
>> 
>> How about ceil_log2 then? I’d like the “ceil” be part of the name to signify that we will only return an integer, such that:
>> 
>> 2^ceil_log2(value) >= value.
>> 
>> I’ll also make same modification in stringTable.cpp
> 
> The suggested ceil_log2 is a better name than the actually changed to
> log2_ceil, which I read as the log2 of the ceil of x.

Fixed.


> 
>>> src/hotspot/share/classfile/symbolTable.cpp 
>>> +void SymbolTable::delete_symbol(Symbol* sym) {
>>> +  if (sym->refcount() == PERM_REFCOUNT) {
>>> +    MutexLocker ml(SymbolTable_lock); // Protect arena
>>> +    // Deleting permanent symbol should not occur very often (insert race condition),
>>> +    // so log it.
>>> +    _log_trace_symboltable_helper(sym, "Freeing permanent symbol");
>>> +    if (!arena()->Afree(sym, sym->size())) {
>>> +      _log_trace_symboltable_helper(sym, "Leaked permanent symbol");
>>> +    }
>>> +  } else {
>>> +    delete sym;
>>> +  }
>>> +}
>>> 
>>> I'm surprised there isn't an assert that sym->refcount() == 0 in the
>>> else clause.  That would also obviate the need for the assert in the
>>> only caller (SymbolTableConfig::free_node).
>> 
>> I like the assert in SymbolTableConfig::free_node better, which also handles "sym->refcount() == PERM_REFCOUNT", so when we get to "SymbolTable::delete_symbol” we know we’re good.
> 
> Sorry, but I'm not convinced. I think it's better to have the assert
> near the place that actually cares, which is delete_symbol. I think
> the assert check for PERM_REFCOUNT is just redundant. I also think the
> associated comment would be better placed in delete_symbol.


#1 static void free_node(void* memory, Symbol* const& value) {

does more than actually deletes the symbol. It also calls “SymbolTableHash::BaseConfig::free_node”, so it needs the “assert(ym->refcount() == 0” itself. It’s the only place that calls "SymbolTable::delete_symbol”, so it can guarantee that “delete_symbol” is called with a dead symbol.

#2 “SymbolTable::delete_symbol” is a wrapper for “delete symbol”, which is Symbol private. But if I could, I would call it directly in “free_node” and then the assert would have to be in “free_node"

> 

>>> src/hotspot/share/classfile/symbolTable.hpp
>>> 121   static bool _lookup_shared_first;
>>> 
>>> This variable is read and written by multiple threads concurrently, so
>>> ought to be volatile.
>> 
>> Fixed. I did the same for “_alt_hash” flag.
>> 
>> I’ll also make same modification in stringTable.cpp
> 
> _alt_hash is not subject to concurrent modification, which is a good
> thing because there is lots of code that is not set up to cope with
> that possibility.  It is only set by rehash_table, which is only
> called from safepoint cleanup.
> 
> Hm, but looking at that makes me nervous. We're running various
> cleanup tasks in parallel. What happens if one of the other parallel
> tasks needs to look up a symbol? That seems like it would lead to bad
> things if it can happen. And it's not entirely obvious to me that it
> can't happen.  Similarly for stringtable.
> 
> Given the way it's used, it seems like it should be a StringTable
> ordinary member rather than a static member, and the new table
> produced by rehashing (that wants it turned on) should be constructed
> with it being true.

I really like the idea of the item table being immutable wrt hash algorithm, so I refactored the code to do this. I like how it simplified the hash selection algorithm and how _murmur_seed detail is hidden away. If we like it, I can do the same for stringTable later.


>>> src/hotspot/share/classfile/symbolTable.cpp
>>> SymbolTable::lookup(const Symbol*, int, int, TRAPS)
>>> There is also a comment for this overload in the header that is no
>>> longer relevant.
>>> 
>> 
>> All fixed.
> 
> The comment is still there.

Fixed.


> 
>>> src/hotspot/share/classfile/symbolTable.hpp
>>> 126   volatile int _items;
>>> 127   DEFINE_PAD_MINUS_SIZE(1, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));
>>> 128   volatile int _uncleaned_items;
>>> 129   DEFINE_PAD_MINUS_SIZE(2, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));
>>> 
>>> I'm not entirely sure what the intent of the padding here is, but it
>>> looks strange to me.  Also, do you have measurements that suggest
>>> padding matters?  And why aren't other members that are modified by
>>> multiple threads concurrently not padded?
>> 
>> I copied that code from stringTable.hpp and at the time I was the one who asked Robbin whether the volatile fields needed padding, which seemed to be a pattern used in hotspot, so now we have it. Now, however, that I’m more familiar with the code, I’d rather have no padding and add it back only as needed and determined by the work in JDK-8208142
> 
> OK.
> 
>>> src/hotspot/share/classfile/symbolTable.hpp
>>> 
>>> I find the naming confusing that lookup() does find or insert, but
>>> helper functions like lookup_dynamic just find.
>> 
>> Me too, I intend on cleaning this more as part of JDK-8208142
> 
> OK.
> 
>>> src/hotspot/share/classfile/symbolTable.hpp
>>> 181   size_t table_size(Thread* thread = NULL);
>>> 
>>> No caller passes a thread argument.  I think the argument exists only
>>> because in the implementation the thread is needed for RCU
>>> protection.  That's an implementation detail that shouldn't propogate
>>> out to the API here.  So I think the argument should be eliminated.
>> 
>> SymbolTable::grow(JavaThread* jt) uses it:
>> 
>> 678:  _current_size = table_size(jt);
> 
> Yeah, I forgot about that one call.  But it doesn't matter for my
> argument; the elimination of the thread argument there would make no
> performance difference and makes the API cleaner.

Fixed, did the same for stringTable.


>>> src/hotspot/share/classfile/symbolTable.cpp 
>>> 478     _created = SymbolTable::the_table()->allocate_symbol((const u1*)_name, _len, _heap, _thread);
>>> 
>>> [pre-existing] Why does the Symbol take a const u1* name argument?
>>> Especially since that's not how it's used?  It should probably be
>>> const char*.
>> 
>> Agree, I prefer symbolTable APIs to consistently use “const char *” throughout and only cast to (const u1*) when using Symbol APIs.
> 
> To me it seems like a bug that the Symbol constructor uses u1 as the
> element type of its name argument. I also wonder why it uses jbyte
> rather than char as the element type of the body, but maybe there is
> some underlying string that would be pulled by changing that. Doing
> anything about this could be a followup cleanup RFE.

Filed JDK-8209138


>>> src/hotspot/share/classfile/symbolTable.cpp 
>>> 518   bool rehash_warning;
>>> 519   bool clean_hint;
>>> 
>>> Please initialize these variables, so a reader doesn't need to go
>>> digging through the get_insert_lazy to figure out whether they are
>>> alwasy set, or possibly only conditionally set.
>>> 
>>> Similarly for line 387.
>> 
>> I had to do the exact same exercise myself to make sure they are set inside the concurrent hash table code, and didn’t like that - fixed.
> 
> Thanks!
> 
>>> src/hotspot/share/utilities/globalCounter.cpp
>>> 64   // Handle bootstrap
>>> 65   if (Threads::number_of_threads() == 0) {
>>> 66     return;
>>> 67   }
>>> 
>>> What is this about?  number_of_threads only refers to Java threads,
>>> but the iteration should deal with there being none, right?  Or does
>>> this get called so early (because of SymbolTable changes) that even
>>> that iteration setup doesn't work?
>>> 
>>> If not that, then maybe we're calling before the VMThread exists, and
>>> that's the real problem?  In which case a different test, differently
>>> located, seems necessary.
>> 
>> We needed it when I was bringing up the code and I tried removing it to see whether we still need it and it seems that we do (details on the crash in the bug itself). Maybe Robbin can shed more light here?
> 
> The problem here is that we're using the StringTable (and therefore
> GlobalCounter) before the main thread has been registered.  That
> registration happens in create_vm() (Threads::add(main_thread), line
> 3729), which follows the call to init_globals() (where the symbol
> table usage is occurring) in create_vm().
> 
> Maybe moving the Threads::add earlier would fix the problem?  But this
> initialization code is very ordering sensitive, so I don't know if
> that would work.
> 
> Is there a bug for this problem with GlobalCounter?

Filed JDK-8209139 to track this.


> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/symbolTable.cpp
> 128 SymbolTable::SymbolTable() : _local_table(NULL), _current_size(0), _has_work(0),
> 129   _needs_rehashing(false), _items_count(0), _uncleaned_items_count(0),
> 130   _symbols_removed(0), _symbols_counted(0) {
> 
> The order of the initializers is not declaration order, so this will
> fail to compile as soon as Thomas's -Wreorder change is pushed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/symbolTable.cpp
> 334   SymbolTableLookup(Thread* thread, const char* key, int len, uintx hash)
> 335   : _thread(thread), _hash(hash), _str(key), _len(len) {}
> 
> Another out of order initializer list.

Fixed - I wish the compiler could rearrange the initialization order to match declaration order on its own.


> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/symbolTable.cpp
> 727   Atomic::add((size_t)stdc._processed, &_symbols_counted);
> 
> Why not make SymbolTableDeleteCheck::_processed by size_t, making this
> cast unnecessary?
> 
> And why not make SymbolTableDoDelete::_deleted also size_t.
> 
> These changes would require making the logging call at the end of
> clean_dead_entries use SIZE_FORMAT rather than INT32_FORMAT.

Fixed. Also made similar change in stringTable. 


> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/stringTable.hpp
>  87   size_t add_items_count_to_clean(size_t ndead);
> 
> This name change doesn't seem right.

How about “add_dead_items_count”? Also made similar change in symbolTable. 


cheers



More information about the hotspot-runtime-dev mailing list