RFR (xs) 8145940: TempNewSymbol should have correct copy and assignment functions
Markus Gronlund
markus.gronlund at oracle.com
Tue Jan 12 12:34:42 UTC 2016
Hi Kim,
Thanks a lot for the (detailed) reminder about the copy-and-swap idiom. I agree with your reasoning.
About the return value from operator=, I would suggest retaining the void as-is as well.
Maybe we should investigate if it is possible to have the functions in SymbolTable that increment Symbol*'s return a TempNewSymbol( or some other "smartptr" concept) instead of a direct Symbol*?
In that case we could make the type conversion constructor private and make SymbolTable a friend of TempNewSymbol - this would avoid accidental wrapping of arbitrary Symbol*'s. I will look into the possibility of this.
Thanks again for your clarification
Markus
-----Original Message-----
From: Kim Barrett
Sent: den 12 januari 2016 05:22
To: Coleen Phillimore
Cc: hotspot-runtime-dev
Subject: Re: RFR (xs) 8145940: TempNewSymbol should have correct copy and assignment functions
On Jan 11, 2016, at 8:34 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>
>
> Kim and Lois,
>
> Thanks for looking at this. This is my latest webrev with the ExecuteInternalVMTest in it:
>
> http://cr.openjdk.java.net/~coleenp/8145940.02/webrev/index.html
>
> See comments below.
>
> On 1/11/16 6:45 PM, Kim Barrett wrote:
>> On Jan 7, 2016, at 3:02 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>> Summary: Add clear() to the assignment operator and add copy constructor.
>>>
>>> Ran all jtreg, colocated and non-colocated tests. RunThese -jck with PrintSymbolTableSizeHistogram statistics:
>>>
>>> Percent removed 1.35
>>> Reference counts 194583
>>>
>>> clean:
>>>
>>> Percent removed 1.53
>>> Reference counts 194245
>>>
>>> Without a reference counting copy constructor, we could remove a TempNewSymbol's Symbol if a GC happens. Consider:
>>>
>>> TempNewSymbol ts = SymbolTable::new_symbol("abc"); // Hit GC
>>>
>>> The ref count for "abc" is 1 when created by new_symbol, and the destructor could run after the copy into ts, decrementing the reference count to 0 again. GC could unlink that symbol from the symbol table. Fortunately, we haven't seen this bug.
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8145940/
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8145940
>>>
>>> Thanks,
>>> Coleen
>> For the assignment operator, I think better is:
>>
>> void operator=(TempNewSymbol s) {
>> Symbol* tmp = s._temp;
>> s._temp = _temp;
>> _temp = tmp;
>> }
>>
>> This is the well-known copy and swap idiom, with a by-hand "swap".
>>
>> http://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap-
>> idiom
>
> Very tricky. Lois and I figured out the trick. It doesn't elide any Atomic::* reference counting though, which is what is worrysome for performance, just hides it in the destructor of 's'. I don't want to change this.
Copy-swap saves several refcount manipulations in some (common) cases, by taking advantage of rvalue copy elision.
Coleen's:
void operator=(const TempNewSymbol& s) {
clear();
_temp = s._temp;
if (_temp != NULL) _temp->increment_refcount(); }
Markus's:
const TempNewSymbol& operator=(const TempNewSymbol& rhs) {
if (this != &rhs && _temp != rhs._temp) {
clear();
_temp = rhs._temp;
if (_temp != NULL) {
_temp->increment_refcount();
}
}
return *this;
}
Kim's:
void operator=(TempNewSymbol rhs) {
Symbol* tmp = rhs._temp;
rhs._temp = _temp;
_temp = tmp;
}
[I'm ignoring differences in the return value in the discussion below.
I think it ought to be either "TempNewSymbol&" (not const) or void.]
Markus's is mostly the same as Coleen's, except it handles a couple of special cases differently.
Markus's handles self-assignment, where Coleen's doesn't. For correctness that's important; in actual practice self-assignment is generally rare to non-existent, so always paying overhead for it is unpleasant.
Markus's also specially handles re-assignment to the same symbol, avoiding any refcount changes in that case. I suggest this is another rare case, and adding overhead in the normal case to get that rare benefit is not the right tradeoff.
Assuming we're not in one of those special cases and neither Symbol* is NULL, Coleen/Markus decrements the old and increments the new. And in an important special case where rhs is an rvalue, there will be another decrement as the temporary goes out of scope. So two refcount changes when rhs is an lvalue, three when it's an rvalue.
In contrast, Kim's does an increment and a decrement when rhs is an lvalue, and *only* a decrement when rhs is an rvalue and rvalue copy elision is performed by the compiler, which any decent optimizing compiler will do.
The rhs is an rvalue in the following situation:
TempNewSymbol tns = ... ;
...
tns = expr_returning_Symbol* ;
The rhs gets implicitly converted to a temporary (rvalue) TempNewSymbol referring to the symbol (no refcount change), that temporary is directly bound to the assignment argument (copy elided), swap the symbols, and the rvalue converted temp goes out of scope and decrements the refcount of the old Symbol*.
The copy-swap version also handles the (rare) self-assignment and re-assignment to the same symbol cases. In these cases Kim's still does the same refcounting changes as otherwise. But it's better to do extra work in the rare cases and keep the normal cases fast.
This is why copy-swap has become the "usual idiom"for assignment operators. There might be cases where it's not the best answer, but usually it is. And this is why (to answer Lois's issue) one should use it even though it's not obvious if one is not previously familiar with the idiom; one should instead learn the idiom, and then one spends time looking hard at implementations that do something else.
Of course, this may all be entirely in the noise; I hope that TempNewSymbol assignment is not a high frequency operation.
>> I think the updated description of TempNewSymbol isn't really right.
>> Because the conversion constructor doesn't increment the reference
>> count, it must *not* be used to capture some arbitrary reference to a
>> Symbol*. Only a new symbol or one obtained from one of the lookup
>> functions really should be managed by this class.
>
> Yes, this is a subtlety of the design and one that I can't really fix right now. I thought of changing TempNewSymbol to convert from const char * but there are cases where we don't want to add a symbol.
>
> There is a comment above TempNewSymbol. I don't know what to add to it. And Markus's comments are really good.
I was confused about which webrev I was looking at. Coleen's doesn't seem to change those comments.
It was (parts of) Markus's comments that worried me. And the whole design is tricky and risky.
More information about the hotspot-runtime-dev
mailing list