RFR (L) JDK-8195100: Use a low latency hashtable for SymbolTable

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Tue Aug 7 01:00:58 UTC 2018


Gerard,  the changes in rev5 look good.
Thanks,
Coleen

On 8/6/18 12:39 PM, Gerard Ziemski wrote:
> Thank you Kim very much for such thorough and detailed feedback!
>
> New webrev at http://cr.openjdk.java.net/~gziemski/8195100_rev5
>
> rev5 vs rev4 at http://cr.openjdk.java.net/~gziemski/8195100_rev5_4
>
>
>
>> On Aug 2, 2018, at 6:16 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>
>>> On Aug 2, 2018, at 2:19 PM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
>>> New webrev at http://cr.openjdk.java.net/~gziemski/8195100_rev4
>> Some of these comments are from earlier versions and might have been obsoleted.
>> And some were from before I applied your patch to make my own webrev that doesn’t
>> have a messed up frame view of stringTable.cpp; such comments contain diffs for
>> context but no line numbers.
>>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/compactHashtable.hpp
>> 236     return (_entry_count==0);
>>
>> Need's spaces around operator, per HotSpot style guide.
>> I think the parens here are pointless, but that's my preference.
> Changed to:
>
> 236     return (_entry_count == 0);
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +#define PREF_AVG_LIST_LEN           8
>> +#define END_SIZE                    17
>> +#define REHASH_LEN                  100
>> +#define CLEAN_DEAD_HIGH_WATER_MARK  0.0
>> +#define ON_STACK_BUFFER_LENGTH 128
>>
>> Why are these macros, rather than constants?
> There are pro and cons for each and we could also consider "enum {PREF_AVG_LIST_LEN = 8;}” here.
>
> I copied the code from stringTable.cpp and I’m not sure I see a really good reason to change both files. Do you have a strong opinion here?
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +static inline void _log_trace_symboltable_helper(Symbol* sym, const char* msg) {
>> +#ifndef PRODUCT
>> +  if (log_is_enabled(Trace, symboltable)) {
>> +    char *s;
>> +    {
>> +      ResourceMark rm;
>> +      s = sym->as_quoted_ascii();
>> +      s = os::strdup(s);
>> +    }
>> +    if (s == NULL) {
>> +      log_trace(symboltable)("%s [%s]", msg, "NULL");
>> +    } else {
>> +      log_trace(symboltable)("%s [%s]", msg, s);
>> +      os::free(s);
>> +    }
>> +  }
>> +#endif // PRODUCT
>> +}
>>
>> Why the strdup/free here?  Just log the result of as_quoted_ascii
>> directly (assuming it's not NULL).
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +class SymbolTableConfig : public SymbolTableHash::BaseConfig {
>>
>> I think there's an underlying design bug here.  But I guess go with
>> what you've got, and I may suggest some changes later, in conjunction
>> with changes to ConcurrentHashTable.
> OK, you will file a followup issue?
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +static size_t ceil_pow_2(uintx value) {
>> +  size_t ret;
>> +  for (ret = 1; ((size_t)1 << ret) < value; ++ret);
>> +  return ret;
>> +}
>>
>> This is poorly named; it's really a variant of integer log2.
> How about ceil_log2 then? I’d like the “ceil” be part of the name to signify that we will only return an integer, such that:
>
> 2^ceil_log2(value) >= value.
>
> I’ll also make same modification in stringTable.cpp
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +void SymbolTable::delete_symbol(Symbol* sym) {
>> +  if (sym->refcount() == PERM_REFCOUNT) {
>> +    MutexLocker ml(SymbolTable_lock); // Protect arena
>> +    // Deleting permanent symbol should not occur very often (insert race condition),
>> +    // so log it.
>> +    _log_trace_symboltable_helper(sym, "Freeing permanent symbol");
>> +    if (!arena()->Afree(sym, sym->size())) {
>> +      _log_trace_symboltable_helper(sym, "Leaked permanent symbol");
>> +    }
>> +  } else {
>> +    delete sym;
>> +  }
>> +}
>>
>> I'm surprised there isn't an assert that sym->refcount() == 0 in the
>> else clause.  That would also obviate the need for the assert in the
>> only caller (SymbolTableConfig::free_node).
> I like the assert in SymbolTableConfig::free_node better, which also handles "sym->refcount() == PERM_REFCOUNT", so when we get to "SymbolTable::delete_symbol” we know we’re good.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +double SymbolTable::get_load_factor() {
>> +  return (_items*1.0)/_current_size;
>> +}
>> +
>> +double SymbolTable::get_dead_factor() {
>> +  return (_uncleaned_items*1.0)/_current_size;
>> +}
>>
>> Multiplying by 1.0 seems like an odd way to convert to double.
> Yes, I thought same as well when I saw this code the first time (I copied it from stringTable). I’ll use “(double)” cast instead.
>
> I’ll also make same modification in stringTable.cpp
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +size_t SymbolTable::item_added() {
>> +  return Atomic::add(1, &(SymbolTable::the_table()->_items));
>> +}
>>
>> +void SymbolTable::item_removed() {
>> +  Atomic::add(1, &(SymbolTable::the_table()->_symbols_removed));
>> +  Atomic::sub(1, &(SymbolTable::the_table()->_items));
>> +}
>>
>> Atomic::inc and dec might be clearer than Atomic::add or sub of 1.
>> That also avoids any need to be aware of the type of _items or
>> _symbols_removed (see next comment).
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>> Why are _symbols_removed, _symbols_counted, and _items int rather than
>> an unsigned type?  These are counters, and should never be negative.
>
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>> _items seems like a not very descriptive name.  It's the number of
>> symbols in the table.  size() is the usual C++ container operation for
>> the number of elements.
> Agree, changed to _items_count
>
> I’ll also make same modification in stringTable.cpp
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>> 121   static bool _lookup_shared_first;
>>
>> This variable is read and written by multiple threads concurrently, so
>> ought to be volatile.
> Fixed. I did the same for “_alt_hash” flag.
>
> I’ll also make same modification in stringTable.cpp
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>> 153   Symbol* do_lookup(char* name, int len, uintx hash);
>> 154   Symbol* do_add_if_needed(char* name, int len, uintx hash, bool heap, TRAPS);
>>
>> Why do these take a non-const char* name?  Shouldn't the name be
>> const?  There are several casts that would not be necessary if the
>> names were const here.
> That’s the inherited code from the original symbol table, but I'll clean it up.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> char* name;
>> ...
>>    name = (char*)sym->base() + begin;
>>
>> [pre-existing] Why isn't that "const char* name;”?
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> SymbolTable::lookup(const char*, int len, TRAPS)
>>
>> Rather than calling lookup_common, it seems like it would be better
>> here to do a lookup_shared and use the result if non-NULL, else
>> call do_add_if_needed.
> If we didn’t need “lookup_common", then we could just call “do_add_if_needed”. I’ll leave such work as part of JDK-8208142, if that’s OK.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> SymbolTable::lookup(const Symbol*, int, int, TRAPS)
>> // Make sure there is no safepoint in the code above since name can't move.
>> // We can't include the code in NoSafepointVerifier because of the
>> // ResourceMark.
>>
>> [pre-existing] Maybe this comment is a left-over from permgen?
>> Symbols (and their names) don't move.
>>
>> And the use of a NoSafepointVerifier in an earlier part of that
>> function seems similar.
>>
>> Coleen suggested asserting sym->refcount() != 0 on entry, to ensure
>> the symbol and its name can't be deleted out from under.
>>
>> And getting rid of the NoSafepointVerifier would allow the early part
>> of the function to be cleaned up.
>>
>> Oh, but then in the do_add_if_needed call, name is being used rather
>> than buffer.  And that's okay.  So all the work to copy the name is a
>> waste of time.
>>
>> [Later: Much of the above has been addressed by responses to Ioi's
>> comments.  What's below is about the rev4 version of this function.]
>>
>> There is also a comment for this overload in the header that is no
>> longer relevant.
>>
>> The NoSafepointVerifier seems superfluous.  What's it for?  Nothing
>> here can be moved by GC.  The symbol argument better have a non-zero
>> refcount, so there's no risk of it being deleted.
> All fixed.
>
>
>> Why isn't this looking first in the shared table?  It is going
>> straight to do_add_if_needed, which doesn't consider the shared table.
>> This seems like a bug.
> Good catch, fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +  bool equals(Symbol** value, bool* is_dead) {
>> ...
>> +      if (sym->try_increment_refcount()) {
>> ...
>> +      } else {
>> +        *is_dead = (sym->refcount() == 0);
>>
>> Better here I think would be
>>
>> assert(sym->refcount() == 0, ...);
>> *is_dead = true;
>>
>> as try_increment_refcount only returns false when the refcount is
>> zero.
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +  bool equals(Symbol** value, bool* is_dead) {
>> ...
>> +    if (sym->equals(_str, _len)) {
>> ...
>> +    } else {
>> +      return false;
>>
>> Shouldn't this be reporting whether the symbol is dead?  E.g. here add
>>
>> *is_dead = (sym->refcount() == 0);
> I caught that already too, fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +class SymbolTableLookupChar : StackObj {
>>
>> What is the trailing "Char" in the name for?  Why isn't
>> SymbolTableLookup sufficient?
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> +class SymbolTableLookupChar : StackObj {
>> ...
>> +  Symbol* _found;
>>
>> _found is recorded but is inaccessible and so never used.
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>>
>> Why are the len arguments for the various lookup functions and the
>> like int rather than uint?
> Fixed, using size_t.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>> 126   volatile int _items;
>> 127   DEFINE_PAD_MINUS_SIZE(1, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));
>> 128   volatile int _uncleaned_items;
>> 129   DEFINE_PAD_MINUS_SIZE(2, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));
>>
>> I'm not entirely sure what the intent of the padding here is, but it
>> looks strange to me.  Also, do you have measurements that suggest
>> padding matters?  And why aren't other members that are modified by
>> multiple threads concurrently not padded?
> I copied that code from stringTable.hpp and at the time I was the one who asked Robbin whether the volatile fields needed padding, which seemed to be a pattern used in hotspot, so now we have it. Now, however, that I’m more familiar with the code, I’d rather have no padding and add it back only as needed and determined by the work in JDK-8208142
>
> I also changed the fields into unsigned, using size_t.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>> 120   // Set if one bucket is out of balance due to hash algorithm deficiency
>>
>> This comment seems to be misplaced.
> Fixed.
>
> I’ll also make same modification in stringTable.hpp
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>>
>> I find the naming confusing that lookup() does find or insert, but
>> helper functions like lookup_dynamic just find.
> Me too, I intend on cleaning this more as part of JDK-8208142
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> 440   for (int i=0; i<names_count; i++) {
>> ...
>> 448     assert(sym->refcount()!=0, "lookup should have incremented the count");
>>
>> Spaces around operators.
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> 441     char *name = (char*)names[i];
>>
>> The case here shouldn't be needed; see earlier comment about missing
>> const qualification for some name parameters.
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> 444     Symbol* sym = SymbolTable::the_table()->lookup_common(name, len, hash);
>> 445     if (sym == NULL) {
>> 446       sym = SymbolTable::the_table()->do_add_if_needed(name, len, hash, c_heap, CHECK);
>> 447     }
>>
>> Again here, lookup_shared + conditional do_add_if_needed seems like it
>> would be better.
> I’ll look into it as part of JDK-8208142
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> 463   void _assert_for_name(Symbol* sym, const char* where) const {
>> ...
>> 465     for (int i=0; i<_len; i++) {
>>
>> This function's name should not have a leading underscore.  See style guide.
>> And more spaces around operators…
> Fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.hpp
>> 181   size_t table_size(Thread* thread = NULL);
>>
>> No caller passes a thread argument.  I think the argument exists only
>> because in the implementation the thread is needed for RCU
>> protection.  That's an implementation detail that shouldn't propogate
>> out to the API here.  So I think the argument should be eliminated.
> SymbolTable::grow(JavaThread* jt) uses it:
>
> 678:  _current_size = table_size(jt);
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> 478     _created = SymbolTable::the_table()->allocate_symbol((const u1*)_name, _len, _heap, _thread);
>>
>> [pre-existing] Why does the Symbol take a const u1* name argument?
>> Especially since that's not how it's used?  It should probably be
>> const char*.
> Agree, I prefer symbolTable APIs to consistently use “const char *” throughout and only cast to (const u1*) when using Symbol APIs.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> 480 #ifdef ASSERT
>> 481     _assert_for_name(_created, "operator()()");
>> 482 #endif
>>
>> Instead of wrapping the calls in #ifdef ASSERT, make the body of
>> _assert_for_name be #ifdef ASSERT, so it's an (implicitly inlined)
>> empty body in non-debug build.
>>
>> And again at lines 503 and 508.
> I was wondering about that - fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> 518   bool rehash_warning;
>> 519   bool clean_hint;
>>
>> Please initialize these variables, so a reader doesn't need to go
>> digging through the get_insert_lazy to figure out whether they are
>> alwasy set, or possibly only conditionally set.
>>
>> Similarly for line 387.
> I had to do the exact same exercise myself to make sure they are set inside the concurrent hash table code, and didn’t like that - fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/classfile/symbolTable.cpp
>> 718     if (sym->refcount() == 0) {
>> 719       return true;
>> 720     } else {
>> 721       return false;
>> 722     }
>>
>> return sym->refcount() == 0;
> I think I used to have more code in there - fixed.
>
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/utilities/globalCounter.cpp
>> 64   // Handle bootstrap
>> 65   if (Threads::number_of_threads() == 0) {
>> 66     return;
>> 67   }
>>
>> What is this about?  number_of_threads only refers to Java threads,
>> but the iteration should deal with there being none, right?  Or does
>> this get called so early (because of SymbolTable changes) that even
>> that iteration setup doesn't work?
>>
>> If not that, then maybe we're calling before the VMThread exists, and
>> that's the real problem?  In which case a different test, differently
>> located, seems necessary.
> We needed it when I was bringing up the code and I tried removing it to see whether we still need it and it seems that we do (details on the crash in the bug itself). Maybe Robbin can shed more light here?
>
>
> Cheers
>
>
>



More information about the hotspot-runtime-dev mailing list