RFR (xs) 8145940: TempNewSymbol should have correct copy and assignment functions
Coleen Phillimore
coleen.phillimore at oracle.com
Tue Jan 12 21:38:28 UTC 2016
Kim, see below.
On 1/11/16 11:22 PM, Kim Barrett wrote:
> On Jan 11, 2016, at 8:34 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>
>> Kim and Lois,
>>
>> Thanks for looking at this. This is my latest webrev with the ExecuteInternalVMTest in it:
>>
>> http://cr.openjdk.java.net/~coleenp/8145940.02/webrev/index.html
>>
>> See comments below.
>>
>> On 1/11/16 6:45 PM, Kim Barrett wrote:
>>> On Jan 7, 2016, at 3:02 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>>> Summary: Add clear() to the assignment operator and add copy constructor.
>>>>
>>>> Ran all jtreg, colocated and non-colocated tests. RunThese -jck with PrintSymbolTableSizeHistogram statistics:
>>>>
>>>> Percent removed 1.35
>>>> Reference counts 194583
>>>>
>>>> clean:
>>>>
>>>> Percent removed 1.53
>>>> Reference counts 194245
>>>>
>>>> Without a reference counting copy constructor, we could remove a TempNewSymbol's Symbol if a GC happens. Consider:
>>>>
>>>> TempNewSymbol ts = SymbolTable::new_symbol("abc");
>>>> // Hit GC
>>>>
>>>> The ref count for "abc" is 1 when created by new_symbol, and the destructor could run after the copy into ts, decrementing the reference count to 0 again. GC could unlink that symbol from the symbol table. Fortunately, we haven't seen this bug.
>>>>
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8145940/
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8145940
>>>>
>>>> Thanks,
>>>> Coleen
>>> For the assignment operator, I think better is:
>>>
>>> void operator=(TempNewSymbol s) {
>>> Symbol* tmp = s._temp;
>>> s._temp = _temp;
>>> _temp = tmp;
>>> }
>>>
>>> This is the well-known copy and swap idiom, with a by-hand "swap".
>>>
>>> http://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap-idiom
>> Very tricky. Lois and I figured out the trick. It doesn't elide any Atomic::* reference counting though, which is what is worrysome for performance, just hides it in the destructor of 's'. I don't want to change this.
> Copy-swap saves several refcount manipulations in some (common) cases,
> by taking advantage of rvalue copy elision.
>
> Coleen's:
>
> void operator=(const TempNewSymbol& s) {
> clear();
> _temp = s._temp;
> if (_temp != NULL) _temp->increment_refcount();
> }
>
> Markus's:
>
> const TempNewSymbol& operator=(const TempNewSymbol& rhs) {
> if (this != &rhs && _temp != rhs._temp) {
> clear();
> _temp = rhs._temp;
> if (_temp != NULL) {
> _temp->increment_refcount();
> }
> }
> return *this;
> }
>
> Kim's:
>
> void operator=(TempNewSymbol rhs) {
> Symbol* tmp = rhs._temp;
> rhs._temp = _temp;
> _temp = tmp;
> }
>
> [I'm ignoring differences in the return value in the discussion below.
> I think it ought to be either "TempNewSymbol&" (not const) or void.]
>
> Markus's is mostly the same as Coleen's, except it handles a couple of
> special cases differently.
Yes, I adopted Markus's version in my last change.
> Markus's handles self-assignment, where Coleen's doesn't. For
> correctness that's important; in actual practice self-assignment is
> generally rare to non-existent, so always paying overhead for it is
> unpleasant.
>
> Markus's also specially handles re-assignment to the same symbol,
> avoiding any refcount changes in that case. I suggest this is another
> rare case, and adding overhead in the normal case to get that rare
> benefit is not the right tradeoff.
>
> Assuming we're not in one of those special cases and neither Symbol*
> is NULL, Coleen/Markus decrements the old and increments the new. And
> in an important special case where rhs is an rvalue, there will be
> another decrement as the temporary goes out of scope. So two refcount
> changes when rhs is an lvalue, three when it's an rvalue.
>
> In contrast, Kim's does an increment and a decrement when rhs is an
> lvalue, and *only* a decrement when rhs is an rvalue and rvalue copy
> elision is performed by the compiler, which any decent optimizing
> compiler will do.
In my small test case, copy constructor elision is not done for either
clang++ or g++ -O3 (attached). So the ref counting is still the same.
Or I'm counting wrong.
>
> The rhs is an rvalue in the following situation:
>
> TempNewSymbol tns = ... ;
> ...
> tns = expr_returning_Symbol* ;
>
> The rhs gets implicitly converted to a temporary (rvalue)
> TempNewSymbol referring to the symbol (no refcount change), that
> temporary is directly bound to the assignment argument (copy elided),
> swap the symbols, and the rvalue converted temp goes out of scope and
> decrements the refcount of the old Symbol*.
>
> The copy-swap version also handles the (rare) self-assignment and
> re-assignment to the same symbol cases. In these cases Kim's still
> does the same refcounting changes as otherwise. But it's better to do
> extra work in the rare cases and keep the normal cases fast.
>
> This is why copy-swap has become the "usual idiom"for assignment
> operators. There might be cases where it's not the best answer, but
> usually it is. And this is why (to answer Lois's issue) one should use
> it even though it's not obvious if one is not previously familiar with
> the idiom; one should instead learn the idiom, and then one spends
> time looking hard at implementations that do something else.
I don't mind learning new C++ or using "usual idioms" I'd have to
comment it to prevent someone from wasting time puzzling over it. If
C++11 further optimizes away the copy constructors, that would be great.
Coleen
>
> Of course, this may all be entirely in the noise; I hope that
> TempNewSymbol assignment is not a high frequency operation.
>
>
>>> I think the updated description of TempNewSymbol isn't really right.
>>> Because the conversion constructor doesn't increment the reference
>>> count, it must *not* be used to capture some arbitrary reference to a
>>> Symbol*. Only a new symbol or one obtained from one of the lookup
>>> functions really should be managed by this class.
>> Yes, this is a subtlety of the design and one that I can't really fix right now. I thought of changing TempNewSymbol to convert from const char * but there are cases where we don't want to add a symbol.
>>
>> There is a comment above TempNewSymbol. I don't know what to add to it. And Markus's comments are really good.
> I was confused about which webrev I was looking at. Coleen’s doesn’t seem to change those comments.
> It was (parts of) Markus’s comments that worried me. And the whole design is tricky and risky.
>
-------------- next part --------------
extern "C" int printf(const char *,...);
class Symbol {
const char* _value;
public:
Symbol(const char* v) : _value(v) {}
const char* value() { return this != 0 ? _value : "null"; }
};
class TempNewSymbol {
public:
Symbol* _sym;
TempNewSymbol() : _sym(0) { printf("default ctor called\n"); }
TempNewSymbol(Symbol* s) : _sym(s) { printf("convert ctor called %s\n", _sym->value()); }
~TempNewSymbol() { printf("dtor called %s\n", _sym->value()); }
TempNewSymbol(const TempNewSymbol& ts) {
_sym = ts._sym;
printf("copy constructor called %s\n", _sym->value());
}
#if 0
const TempNewSymbol& operator=(const TempNewSymbol& ts) {
printf("assignment operator called %s\n", _sym->value());
_sym = ts._sym;
}
#endif
void operator=(TempNewSymbol rhs) {
printf("assignment operator called %s\n", _sym->value());
Symbol* tmp = rhs._sym;
rhs._sym = _sym;
_sym = tmp;
}
};
int main() {
Symbol* s = new Symbol("abc");
TempNewSymbol ss = s;
TempNewSymbol s1 = new Symbol("efg");
TempNewSymbol s2 = new Symbol("hij");
printf("s2 is hij=%s\n", s2._sym->value());
s1 = s2;
printf("s1 is hij=%s\n", s1._sym->value());
s1 = s1;
TempNewSymbol s3;
s3 = new Symbol("klm");
printf("s3 is klm=%s\n", s3._sym->value());
}
More information about the hotspot-runtime-dev
mailing list