RFR (L) JDK-8195100: Use a low latency hashtable for SymbolTable

Kim Barrett kim.barrett at oracle.com
Thu Aug 2 23:16:13 UTC 2018


> On Aug 2, 2018, at 2:19 PM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
> New webrev at http://cr.openjdk.java.net/~gziemski/8195100_rev4

Some of these comments are from earlier versions and might have been obsoleted.
And some were from before I applied your patch to make my own webrev that doesn’t
have a messed up frame view of stringTable.cpp; such comments contain diffs for
context but no line numbers.

------------------------------------------------------------------------------
src/hotspot/share/classfile/compactHashtable.hpp
 236     return (_entry_count==0);

Need's spaces around operator, per HotSpot style guide.
I think the parens here are pointless, but that's my preference.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
+#define PREF_AVG_LIST_LEN           8
+#define END_SIZE                    17
+#define REHASH_LEN                  100
+#define CLEAN_DEAD_HIGH_WATER_MARK  0.0
+#define ON_STACK_BUFFER_LENGTH 128

Why are these macros, rather than constants?

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
+static inline void _log_trace_symboltable_helper(Symbol* sym, const char* msg) {
+#ifndef PRODUCT
+  if (log_is_enabled(Trace, symboltable)) {
+    char *s;
+    {
+      ResourceMark rm;
+      s = sym->as_quoted_ascii();
+      s = os::strdup(s);
+    }
+    if (s == NULL) {
+      log_trace(symboltable)("%s [%s]", msg, "NULL");
+    } else {
+      log_trace(symboltable)("%s [%s]", msg, s);
+      os::free(s);
+    }
+  }
+#endif // PRODUCT
+}

Why the strdup/free here?  Just log the result of as_quoted_ascii
directly (assuming it's not NULL).

------------------------------------------------------------------------------ 
src/hotspot/share/classfile/symbolTable.cpp 
+class SymbolTableConfig : public SymbolTableHash::BaseConfig {

I think there's an underlying design bug here.  But I guess go with
what you've got, and I may suggest some changes later, in conjunction
with changes to ConcurrentHashTable.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
+static size_t ceil_pow_2(uintx value) {
+  size_t ret;
+  for (ret = 1; ((size_t)1 << ret) < value; ++ret);
+  return ret;
+}

This is poorly named; it's really a variant of integer log2.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
+void SymbolTable::delete_symbol(Symbol* sym) {
+  if (sym->refcount() == PERM_REFCOUNT) {
+    MutexLocker ml(SymbolTable_lock); // Protect arena
+    // Deleting permanent symbol should not occur very often (insert race condition),
+    // so log it.
+    _log_trace_symboltable_helper(sym, "Freeing permanent symbol");
+    if (!arena()->Afree(sym, sym->size())) {
+      _log_trace_symboltable_helper(sym, "Leaked permanent symbol");
+    }
+  } else {
+    delete sym;
+  }
+}

I'm surprised there isn't an assert that sym->refcount() == 0 in the
else clause.  That would also obviate the need for the assert in the
only caller (SymbolTableConfig::free_node).

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
+double SymbolTable::get_load_factor() {
+  return (_items*1.0)/_current_size;
+}
+
+double SymbolTable::get_dead_factor() {
+  return (_uncleaned_items*1.0)/_current_size;
+}

Multiplying by 1.0 seems like an odd way to convert to double.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
+size_t SymbolTable::item_added() {
+  return Atomic::add(1, &(SymbolTable::the_table()->_items));
+}

+void SymbolTable::item_removed() {
+  Atomic::add(1, &(SymbolTable::the_table()->_symbols_removed));
+  Atomic::sub(1, &(SymbolTable::the_table()->_items));
+}

Atomic::inc and dec might be clearer than Atomic::add or sub of 1.
That also avoids any need to be aware of the type of _items or
_symbols_removed (see next comment).

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp
Why are _symbols_removed, _symbols_counted, and _items int rather than
an unsigned type?  These are counters, and should never be negative.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp
_items seems like a not very descriptive name.  It's the number of
symbols in the table.  size() is the usual C++ container operation for
the number of elements.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp
 121   static bool _lookup_shared_first;

This variable is read and written by multiple threads concurrently, so
ought to be volatile.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp
 153   Symbol* do_lookup(char* name, int len, uintx hash);
 154   Symbol* do_add_if_needed(char* name, int len, uintx hash, bool heap, TRAPS);

Why do these take a non-const char* name?  Shouldn't the name be
const?  There are several casts that would not be necessary if the
names were const here.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
   char* name;
...
     name = (char*)sym->base() + begin;

[pre-existing] Why isn't that "const char* name;"?

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
SymbolTable::lookup(const char*, int len, TRAPS)

Rather than calling lookup_common, it seems like it would be better
here to do a lookup_shared and use the result if non-NULL, else
call do_add_if_needed.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
SymbolTable::lookup(const Symbol*, int, int, TRAPS)
   // Make sure there is no safepoint in the code above since name can't move.
   // We can't include the code in NoSafepointVerifier because of the
   // ResourceMark.

[pre-existing] Maybe this comment is a left-over from permgen?
Symbols (and their names) don't move.

And the use of a NoSafepointVerifier in an earlier part of that
function seems similar.

Coleen suggested asserting sym->refcount() != 0 on entry, to ensure
the symbol and its name can't be deleted out from under.

And getting rid of the NoSafepointVerifier would allow the early part
of the function to be cleaned up.

Oh, but then in the do_add_if_needed call, name is being used rather
than buffer.  And that's okay.  So all the work to copy the name is a
waste of time.

[Later: Much of the above has been addressed by responses to Ioi's
comments.  What's below is about the rev4 version of this function.]

There is also a comment for this overload in the header that is no
longer relevant.

The NoSafepointVerifier seems superfluous.  What's it for?  Nothing
here can be moved by GC.  The symbol argument better have a non-zero
refcount, so there's no risk of it being deleted.

Why isn't this looking first in the shared table?  It is going
straight to do_add_if_needed, which doesn't consider the shared table.
This seems like a bug.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
+  bool equals(Symbol** value, bool* is_dead) {
...
+      if (sym->try_increment_refcount()) {
...
+      } else {
+        *is_dead = (sym->refcount() == 0);

Better here I think would be

  assert(sym->refcount() == 0, ...);
  *is_dead = true;

as try_increment_refcount only returns false when the refcount is
zero.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
+  bool equals(Symbol** value, bool* is_dead) {
...
+    if (sym->equals(_str, _len)) {
...
+    } else {
+      return false;

Shouldn't this be reporting whether the symbol is dead?  E.g. here add

  *is_dead = (sym->refcount() == 0);

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
+class SymbolTableLookupChar : StackObj {

What is the trailing "Char" in the name for?  Why isn't
SymbolTableLookup sufficient?

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp
+class SymbolTableLookupChar : StackObj {
...
+  Symbol* _found;

_found is recorded but is inaccessible and so never used.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp

Why are the len arguments for the various lookup functions and the
like int rather than uint?

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp
 126   volatile int _items;
 127   DEFINE_PAD_MINUS_SIZE(1, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));
 128   volatile int _uncleaned_items;
 129   DEFINE_PAD_MINUS_SIZE(2, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));

I'm not entirely sure what the intent of the padding here is, but it
looks strange to me.  Also, do you have measurements that suggest
padding matters?  And why aren't other members that are modified by
multiple threads concurrently not padded?

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp
 120   // Set if one bucket is out of balance due to hash algorithm deficiency

This comment seems to be misplaced.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp

I find the naming confusing that lookup() does find or insert, but
helper functions like lookup_dynamic just find.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
 440   for (int i=0; i<names_count; i++) {
...
 448     assert(sym->refcount()!=0, "lookup should have incremented the count");

Spaces around operators.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
 441     char *name = (char*)names[i];

The case here shouldn't be needed; see earlier comment about missing
const qualification for some name parameters.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
 444     Symbol* sym = SymbolTable::the_table()->lookup_common(name, len, hash);
 445     if (sym == NULL) {
 446       sym = SymbolTable::the_table()->do_add_if_needed(name, len, hash, c_heap, CHECK);
 447     }

Again here, lookup_shared + conditional do_add_if_needed seems like it
would be better.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
 463   void _assert_for_name(Symbol* sym, const char* where) const {
...
 465     for (int i=0; i<_len; i++) {

This function's name should not have a leading underscore.  See style guide.
And more spaces around operators...

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.hpp
 181   size_t table_size(Thread* thread = NULL);

No caller passes a thread argument.  I think the argument exists only
because in the implementation the thread is needed for RCU
protection.  That's an implementation detail that shouldn't propogate
out to the API here.  So I think the argument should be eliminated.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
 478     _created = SymbolTable::the_table()->allocate_symbol((const u1*)_name, _len, _heap, _thread);

[pre-existing] Why does the Symbol take a const u1* name argument?
Especially since that's not how it's used?  It should probably be
const char*.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
 480 #ifdef ASSERT
 481     _assert_for_name(_created, "operator()()");
 482 #endif

Instead of wrapping the calls in #ifdef ASSERT, make the body of
_assert_for_name be #ifdef ASSERT, so it's an (implicitly inlined)
empty body in non-debug build.

And again at lines 503 and 508.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
 518   bool rehash_warning;
 519   bool clean_hint;

Please initialize these variables, so a reader doesn't need to go
digging through the get_insert_lazy to figure out whether they are
alwasy set, or possibly only conditionally set.

Similarly for line 387.

------------------------------------------------------------------------------
src/hotspot/share/classfile/symbolTable.cpp 
 718     if (sym->refcount() == 0) {
 719       return true;
 720     } else {
 721       return false;
 722     }

return sym->refcount() == 0;

------------------------------------------------------------------------------
src/hotspot/share/utilities/globalCounter.cpp
  64   // Handle bootstrap
  65   if (Threads::number_of_threads() == 0) {
  66     return;
  67   }

What is this about?  number_of_threads only refers to Java threads,
but the iteration should deal with there being none, right?  Or does
this get called so early (because of SymbolTable changes) that even
that iteration setup doesn't work?

If not that, then maybe we're calling before the VMThread exists, and
that's the real problem?  In which case a different test, differently
located, seems necessary.

------------------------------------------------------------------------------



More information about the hotspot-runtime-dev mailing list