RFR (M) 8207359: Make SymbolTable increment_refcount disallow zero
    Gerard Ziemski 
    gerard.ziemski at oracle.com
       
    Fri Jul 20 14:51:03 UTC 2018
    
    
  
hi Coleen,
I agree with the principle of the workaround, but shouldn’t it be more something like:
  int count = 0;
   for (HashtableEntry<Symbol*, mtSymbol>* e = bucket(index); e != NULL; e = e->next()) {
     count++;  // count all entries in this bucket, not just ones with same hash
     if (e->hash() == hash) {
       Symbol* sym = e->literal();
-      if (sym->equals(name, len) && sym->try_increment_refcount()) {
+      // Skip checking already dead symbols in the bucket.
+      if (sym->refcount() == 0) {
+        count--;   // Don't count this symbol towards rehashing.
+      } else if (sym->equals(name, len) {
+	 if (sym->try_increment_refcount()) {
           // something is referencing this symbol now.
           return sym;
+        } else {
+          count--;   // Don't count this symbol towards rehashing.
+        }
       }
     }
   }
cheers
> On Jul 19, 2018, at 5:14 PM, coleen.phillimore at oracle.com wrote:
> 
> 
> Hi, There is a closed test that does 100,000 lookups on a class that fails resolution, so creates 100,000 Symbols with TempNewSymbol. This results in many zeroed refcounted Symbols in the table which increases lookup time with the current SymbolTable.  With the new concurrent symbol table, which this change is intended to support, the zero refcount symbols are cleaned up on insert and concurrently.
> 
> I have a workaround so that this test doesn't time out.   These are the times for this test on my machine.
> 
> old hashtable no patch: 7.32 seconds
> without workaround: 367 seconds (which can time out on a slow machine)
> with workaround:  61.075 seconds
> with new hashtable: 9.135 seconds
> 
> There are several ways to fix the old hashtable so that it cleans more frequently for this situation but it's not worth doing with the new concurrent hashtable coming.
> 
> open webrev at http://cr.openjdk.java.net/~coleenp/03.incr/webrev
> 
> Thanks,
> Coleen
> 
> On 7/19/18 3:09 PM, coleen.phillimore at oracle.com wrote:
>> 
>> 
>> On 7/19/18 3:07 PM, Ioi Lam wrote:
>>> Looks good!
>> 
>> Thanks, Ioi!
>> Coleen
>>> 
>>> Thanks
>>> 
>>> - Ioi
>>> 
>>> 
>>> On 7/19/18 5:34 AM, coleen.phillimore at oracle.com wrote:
>>>> Please review the revision to this change.   Summary:
>>>> 
>>>> * made decrement_refcount() use CAS loop.
>>>> * fixed duplicated logic in try_increment_refcount() thanks to Kim
>>>> * added gtest case for decrement_refcount.
>>>> * fixed SA code.
>>>> * added a bunch of comments
>>>> 
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8207359.02/webrev
>>>> 
>>>> Retested with hs-tier1-3.
>>>> Thanks,
>>>> Coleen
>>>> 
>>>> On 7/18/18 10:50 PM, coleen.phillimore at oracle.com wrote:
>>>>> 
>>>>> 
>>>>> On 7/18/18 6:35 PM, Ioi Lam wrote:
>>>>>> 
>>>>>> 
>>>>>> On 7/18/18 2:45 PM, coleen.phillimore at oracle.com wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/18/18 5:14 PM, Ioi Lam wrote:
>>>>>>>> Hi Coleen,
>>>>>>>> 
>>>>>>>> The changes look good! The new operations on _length_and_refcount are much cleaner than my old ATOMIC_SHORT_PAIR hack.
>>>>>>> 
>>>>>>> Yes, this makes more sense to me.
>>>>>>>> 
>>>>>>>> symbolTable.cpp:
>>>>>>>> 
>>>>>>>>  SymbolTable::lookup_dynamic() {
>>>>>>>>  ...
>>>>>>>>  214       Symbol* sym = e->literal();
>>>>>>>>  215       if (sym->equals(name, len) && sym->try_increment_refcount()) {
>>>>>>>>  216         // something is referencing this symbol now.
>>>>>>>>  217         return sym;
>>>>>>>>  218       }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> symbol.cpp:
>>>>>>>> 
>>>>>>>>  221 void Symbol::increment_refcount() {
>>>>>>>>  222   if (refcount() != PERM_REFCOUNT) { // not a permanent symbol
>>>>>>>>  223     if (!try_increment_refcount()) {
>>>>>>>>  224 #ifdef ASSERT
>>>>>>>>  225       print();
>>>>>>>>  226 #endif
>>>>>>>>  227       fatal("refcount has gone to zero");
>>>>>>>>  228     }
>>>>>>>>  229     NOT_PRODUCT(Atomic::inc(&_total_count);)
>>>>>>>>  230   }
>>>>>>>>  231 }
>>>>>>>> 
>>>>>>>>  246 // Atomically increment while checking for zero, zero is bad.
>>>>>>>>  247 bool Symbol::try_increment_refcount() {
>>>>>>>>  248   uint32_t old_value = _length_and_refcount;  // fetch once
>>>>>>>>  249   int refc = extract_refcount(old_value);
>>>>>>>>  250
>>>>>>>>  251   if (refc == PERM_REFCOUNT) {
>>>>>>>>  252     return true;
>>>>>>>>  253   } else if (refc == 0) {
>>>>>>>>  254     return false; // effectively dead, can't revive
>>>>>>>>  255   }
>>>>>>>>  256
>>>>>>>>  257   uint32_t now;
>>>>>>>>  258   while ((now = Atomic::cmpxchg(old_value + 1, &_length_and_refcount, old_value)) != old_value) {
>>>>>>>>  259     // failed to increment, check refcount again.
>>>>>>>>  260     refc = extract_refcount(now);
>>>>>>>>  261     if (refc == 0) {
>>>>>>>>  262       return false; // just died
>>>>>>>>  263     } else if (refc == PERM_REFCOUNT) {
>>>>>>>>  264       return true; // just became permanent
>>>>>>>>  265     }
>>>>>>>>  266     old_value = now; // refcount changed, try again
>>>>>>>>  267   }
>>>>>>>>  268   return true;
>>>>>>>>  269 }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So is it valid for Symbol::try_increment_refcount() to return false? SymbolTable::lookup_dynamic() seems to suggest YES, but Symbol::increment_refcount() seems to suggest NO.
>>>>>>> 
>>>>>>> True.  If you are looking up a symbol and someone other thread has decremented the refcount to zero, this symbol should not be returned.  My test exercises this code even without the concurrent hashtable.  When the hashtable is concurrent, a zero-ed Symbol could be deallocated so we don't want to return it.
>>>>>>> 
>>>>>> I think the following should be added as a comment in increment_refcount().
>>>>>>> In the case where you call increment_refcount() not during lookup, it is assumed that you have a symbol with a non-zero refcount and it can't go away while you are holding it.
>>>>> 
>>>>> Ok, added.
>>>>>> 
>>>>>>>> 
>>>>>>>> If it's always an invalid condition, I think the fatal() should be moved inside try_increment_refcount.
>>>>>>>> 
>>>>>>> 
>>>>>>> It isn't fatal at lookup.  The lookup must skip a zero-ed entry.
>>>>>>>> Otherwise, I think you need to add comments in all 3 places, to say when it's possible to get a 0 refcount, and when it's not. And, it might be worth expanding on why "zero is bad" :-)
>>>>>>> 
>>>>>>> How about this comment to try_increment_refcount:
>>>>>>> 
>>>>>>> // Increment refcount while checking for zero.  If the Symbol's refcount becomes zero
>>>>>>> // a thread could be concurrently removing the Symbol. This is used during SymbolTable
>>>>>>> // lookup to avoid reviving a dead Symbol.
>>>>>> Sounds good.
>>>>> 
>>>>> Thanks, Ioi.
>>>>> Coleen
>>>>>> 
>>>>>> Thanks
>>>>>> - Ioi
>>>>>> 
>>>>>>>> 
>>>>>>>> My guess is:
>>>>>>>> + if you're doing a lookup, you might be seeing Symbols that have already been marked for deletion, which is indicated by a 0 refcount. You want to skip such Symbols.
>>>>>>>> 
>>>>>>>> + if you're incrementing the refcount, that means you're holding a valid Symbol, which means this Symbol should have never been marked for deletion.
>>>>>>>> 
>>>>>>>> Is this correct?
>>>>>>> 
>>>>>>> Yes, both true.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 7/17/18 2:08 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>> 
>>>>>>>>> Gerard, thank you for the code review.
>>>>>>>>> 
>>>>>>>>> On 7/17/18 4:13 PM, Gerard Ziemski wrote:
>>>>>>>>>> Thank you Coleen (and Kim)!
>>>>>>>>>> 
>>>>>>>>>> #1 Need copyright year updates:
>>>>>>>>>> 
>>>>>>>>>> src/hotspot/share/oops/symbol.cpp
>>>>>>>>>> src/hotspot/share/classfile/symbolTable.cpp
>>>>>>>>>> src/hotspot/share/classfile/compactHashtable.inline.hpp
>>>>>>>>> 
>>>>>>>>> Yes, I'll update with my commit.
>>>>>>>>>> 
>>>>>>>>>> #2 What’s the purpose of this code in src/hotspot/share/oops/symbol.cpp
>>>>>>>>>> 
>>>>>>>>>>    38   STATIC_ASSERT(max_symbol_length == ((1 << 16) - 1));
>>>>>>>>>> 
>>>>>>>>>> when we have:
>>>>>>>>>> 
>>>>>>>>>>   117   enum {
>>>>>>>>>>   118     // max_symbol_length is constrained by type of _length
>>>>>>>>>>   119     max_symbol_length = (1 << 16) -1
>>>>>>>>>>   120   };
>>>>>>>>>> 
>>>>>>>>>> Wouldn’t that always be true?  Is it to make sure that nobody changes max_symbol_length, because the implementation needs it to be that? If so, should we add comment to:
>>>>>>>>>> 
>>>>>>>>>>   119     max_symbol_length = (1 << 16) -1
>>>>>>>>>> 
>>>>>>>>>> with a big warning of some sorts?
>>>>>>>>> 
>>>>>>>>> Yes, it's so that we can store the length of the symbol into 16 bits.
>>>>>>>>> 
>>>>>>>>> How I change the comment above max_symbol_length from:
>>>>>>>>> 
>>>>>>>>>     // max_symbol_length is constrained by type of _length
>>>>>>>>> 
>>>>>>>>> to
>>>>>>>>> 
>>>>>>>>>     // max_symbol_length must fit into the top 16 bits of _length_and_refcount
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> #3 If we have:
>>>>>>>>>> 
>>>>>>>>>>    39   STATIC_ASSERT(PERM_REFCOUNT == ((1 << 16) - 1));
>>>>>>>>>> 
>>>>>>>>>> then why not
>>>>>>>>>> 
>>>>>>>>>>   101 #define PERM_REFCOUNT ((1 << 16) - 1)) // 0xffff
>>>>>>>>>> 
>>>>>>>>>> or
>>>>>>>>>>    39   STATIC_ASSERT(PERM_REFCOUNT == 0xffff;
>>>>>>>>>>   101 #define PERM_REFCOUNT 0xffff
>>>>>>>>>> 
>>>>>>>>> I can change PERM_REFCOUNT to ((1 << 16)) -1) to be consistent.
>>>>>>>>> 
>>>>>>>>>> #4 We have:
>>>>>>>>>> 
>>>>>>>>>>   221 void Symbol::increment_refcount() {
>>>>>>>>>>   222   if (refcount() != PERM_REFCOUNT) { // not a permanent symbol
>>>>>>>>>>   223     if (!try_increment_refcount()) {
>>>>>>>>>>   224 #ifdef ASSERT
>>>>>>>>>>   225       print();
>>>>>>>>>>   226 #endif
>>>>>>>>>>   227       fatal("refcount has gone to zero");
>>>>>>>>>> 
>>>>>>>>>> but
>>>>>>>>>> 
>>>>>>>>>>   233 void Symbol::decrement_refcount() {
>>>>>>>>>>   234   if (refcount() != PERM_REFCOUNT) { // not a permanent symbol
>>>>>>>>>>   235     int new_value = Atomic::sub((uint32_t)1, &_length_and_refcount);
>>>>>>>>>>   236 #ifdef ASSERT
>>>>>>>>>>   237     // Check if we have transitioned to 0xffff
>>>>>>>>>>   238     if (extract_refcount(new_value) == PERM_REFCOUNT) {
>>>>>>>>>>   239       print();
>>>>>>>>>>   240       fatal("refcount underflow");
>>>>>>>>>>   241     }
>>>>>>>>>>   242 #endif
>>>>>>>>>> 
>>>>>>>>>> Where the line:
>>>>>>>>>> 
>>>>>>>>>>   240       fatal("refcount underflow”);
>>>>>>>>>> 
>>>>>>>>>> is inside #ifdef ASSERT, but:
>>>>>>>>>> 
>>>>>>>>>> 227       fatal("refcount has gone to zero”);
>>>>>>>>>> 
>>>>>>>>>> is outside. Shouldn't “fatal" be consistent in both?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I was thought that looked strange too.  I'll move the #endif from 226 to after 227.
>>>>>>>>> 
>>>>>>>>> Thank you for reviewing the code!
>>>>>>>>> Coleen
>>>>>>>>> 
>>>>>>>>>> cheers
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Jul 17, 2018, at 10:51 AM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Summary: Use cmpxchg for non permanent symbol refcounting, and pack refcount and length into an int.
>>>>>>>>>>> 
>>>>>>>>>>> This is a precurser change to the concurrent SymbolTable change. Zeroed refcounted entries can be deleted at anytime so they cannot be allowed to be zero in runtime code. Thanks to Kim for writing the packing function and helping me avoid undefined behavior.
>>>>>>>>>>> 
>>>>>>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8207359.01/webrev
>>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8207359
>>>>>>>>>>> 
>>>>>>>>>>> Tested with solaris ptrace helper, mach5 tier1-5 including solaris. Added multithreaded gtest which exercises the code.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Coleen
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
    
    
More information about the hotspot-runtime-dev
mailing list