RFR (xs) 8145940: TempNewSymbol should have correct copy and assignment functions

Kim Barrett kim.barrett at oracle.com
Tue Jan 12 04:22:22 UTC 2016


On Jan 11, 2016, at 8:34 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
> 
> 
> Kim and Lois,
> 
> Thanks for looking at this.   This is my latest webrev with the ExecuteInternalVMTest in it:
> 
> http://cr.openjdk.java.net/~coleenp/8145940.02/webrev/index.html
> 
> See comments below.
> 
> On 1/11/16 6:45 PM, Kim Barrett wrote:
>> On Jan 7, 2016, at 3:02 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>> Summary: Add clear() to the assignment operator and add copy constructor.
>>> 
>>> Ran all jtreg, colocated and non-colocated tests.   RunThese -jck with PrintSymbolTableSizeHistogram statistics:
>>> 
>>>  Percent removed          1.35
>>>  Reference counts          194583
>>> 
>>> clean:
>>> 
>>>  Percent removed          1.53
>>>  Reference counts          194245
>>> 
>>> Without a reference counting copy constructor, we could remove a TempNewSymbol's Symbol if a GC happens.  Consider:
>>> 
>>> TempNewSymbol ts = SymbolTable::new_symbol("abc");
>>> // Hit GC
>>> 
>>> The ref count for "abc" is 1 when created by new_symbol, and the destructor could run after the copy into ts, decrementing the reference count to 0 again.  GC could unlink that symbol from the symbol table.  Fortunately, we haven't seen this bug.
>>> 
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8145940/
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8145940
>>> 
>>> Thanks,
>>> Coleen
>> For the assignment operator, I think better is:
>> 
>> void operator=(TempNewSymbol s) {
>>   Symbol* tmp = s._temp;
>>   s._temp = _temp;
>>   _temp = tmp;
>> }
>> 
>> This is the well-known copy and swap idiom, with a by-hand "swap".
>> 
>> http://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap-idiom
> 
> Very tricky.  Lois and I figured out the trick.  It doesn't elide any Atomic::* reference counting though, which is what is worrysome for performance, just hides it in the destructor of 's'.  I don't want to change this.

Copy-swap saves several refcount manipulations in some (common) cases,
by taking advantage of rvalue copy elision.

Coleen's:

void operator=(const TempNewSymbol& s) {
  clear();
  _temp = s._temp;
  if (_temp != NULL) _temp->increment_refcount();
}

Markus's:

const TempNewSymbol& operator=(const TempNewSymbol& rhs) {
  if (this != &rhs && _temp != rhs._temp) {
    clear();
    _temp = rhs._temp;
    if (_temp != NULL) {
      _temp->increment_refcount();
    }
  }
  return *this;
}

Kim's:

void operator=(TempNewSymbol rhs) {
  Symbol* tmp = rhs._temp;
  rhs._temp = _temp;
  _temp = tmp;
}

[I'm ignoring differences in the return value in the discussion below.
I think it ought to be either "TempNewSymbol&" (not const) or void.]

Markus's is mostly the same as Coleen's, except it handles a couple of
special cases differently.

Markus's handles self-assignment, where Coleen's doesn't. For
correctness that's important; in actual practice self-assignment is
generally rare to non-existent, so always paying overhead for it is
unpleasant.

Markus's also specially handles re-assignment to the same symbol,
avoiding any refcount changes in that case.  I suggest this is another
rare case, and adding overhead in the normal case to get that rare
benefit is not the right tradeoff.

Assuming we're not in one of those special cases and neither Symbol*
is NULL, Coleen/Markus decrements the old and increments the new. And
in an important special case where rhs is an rvalue, there will be
another decrement as the temporary goes out of scope. So two refcount
changes when rhs is an lvalue, three when it's an rvalue.

In contrast, Kim's does an increment and a decrement when rhs is an
lvalue, and *only* a decrement when rhs is an rvalue and rvalue copy
elision is performed by the compiler, which any decent optimizing
compiler will do.

The rhs is an rvalue in the following situation:

  TempNewSymbol tns = ... ;
  ...
  tns = expr_returning_Symbol* ;

The rhs gets implicitly converted to a temporary (rvalue)
TempNewSymbol referring to the symbol (no refcount change), that
temporary is directly bound to the assignment argument (copy elided),
swap the symbols, and the rvalue converted temp goes out of scope and
decrements the refcount of the old Symbol*.

The copy-swap version also handles the (rare) self-assignment and
re-assignment to the same symbol cases. In these cases Kim's still
does the same refcounting changes as otherwise.  But it's better to do
extra work in the rare cases and keep the normal cases fast.

This is why copy-swap has become the "usual idiom"for assignment
operators. There might be cases where it's not the best answer, but
usually it is. And this is why (to answer Lois's issue) one should use
it even though it's not obvious if one is not previously familiar with
the idiom; one should instead learn the idiom, and then one spends
time looking hard at implementations that do something else.

Of course, this may all be entirely in the noise; I hope that
TempNewSymbol assignment is not a high frequency operation.


>> I think the updated description of TempNewSymbol isn't really right.
>> Because the conversion constructor doesn't increment the reference
>> count, it must *not* be used to capture some arbitrary reference to a
>> Symbol*. Only a new symbol or one obtained from one of the lookup
>> functions really should be managed by this class.
> 
> Yes, this is a subtlety of the design and one that I can't really fix right now.  I thought of changing TempNewSymbol to convert from const char * but there are cases where we don't want to add a symbol.
> 
> There is a comment above TempNewSymbol.  I don't know what to add to it.   And Markus's comments are really good.

I was confused about which webrev I was looking at.  Coleen’s doesn’t seem to change those comments.
It was (parts of) Markus’s comments that worried me.  And the whole design is tricky and risky.



More information about the hotspot-runtime-dev mailing list